* [PATCH 0/5] New implementation of SRA @ 2009-04-28 10:10 Martin Jambor 2009-04-28 10:10 ` [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA Martin Jambor ` (4 more replies) 0 siblings, 5 replies; 25+ messages in thread From: Martin Jambor @ 2009-04-28 10:10 UTC (permalink / raw) To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka Hi, this patch set contains my new implementation of intraprocedural Scalar Replacement of Aggregates (SRA) that I would like to commit to trunk first. It is essentially a part of the merge of the pretty-ipa branch because a (only slightly different) variant of this implementation was on that branch for a number of months. Nevertheless, this patch set does not contain the interprocedural variant, IPA-SRA that I posted here earlier. I will send that as a followup patch later (hopefully once this is in). Unlike the previous SRA, this one is not based on decomposing aggregate types and references but on get_ref_base_and_extent() and classifying accesses into aggregates on the basis of their offsets and sizes. It only creates scalar replacements and only for those components that are actually individually accessed in a function. The main advantages of this new implementation are that: - It is able to scalarize unions and aggregates that contain unions (PR 32964). Moreover, there are potentially other scalarizable accesses caught by get_ref_base_and_extent() that are not by simple reference analysis, an example is an access to a one-element array that is not at the end of a structure. - It is simpler. In fact, simplicity was the main objective of this new implementation. On the other hand, whenever performance or bogus warning avoidance required it (the two were usually tightly connected), necessary computations and data structures were added, even though some are not super simple. Still, with comments stripped off, the new implementation has 2246 lines of code whereas the old one has 3375. I also believe it's easier to grasp how the new one works, though my view is obviously skewed. Hopefully, it contains less bugs and makes those that are there easier to hunt down. Avoiding all sorts of trickeries also makes reasoning of SRA's effects on the code easier. Its behavior on the branch shows that it does not bring about any non-noise compile time or run time regressions. My isolated and so far rather sloppy benchmarks of this particular version suggest the same thing. Nevertheless, I am about to test it thoroughly using one of our automated testers. If there are indeed no problems, I will propose that these patches are committed to trunk. Because I belive that the final version will be very similar to this one, I'd like to invite others, especially middle-end maintainers, to review it. I have bootstrapped and tested the patches on x86_64-linux-gnu without any problems. I intend to do that on i686 and I'd like to do that on hppa-linux-gnu too but trunk does not bootstrap there as it is (probably one of the post expand-from-SSA issues). I'll keep trying. Thank you very much in advance, Martin ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA 2009-04-28 10:10 [PATCH 0/5] New implementation of SRA Martin Jambor @ 2009-04-28 10:10 ` Martin Jambor 2009-04-28 12:15 ` Richard Guenther 2009-04-28 10:10 ` [PATCH 2/5] Make tree-complex.c:extract_component() handle V_C_Es Martin Jambor ` (3 subsequent siblings) 4 siblings, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-04-28 10:10 UTC (permalink / raw) To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka [-- Attachment #1: fix_iinln.diff --] [-- Type: text/plain, Size: 1880 bytes --] The new intra-SRA produces an extra copy assignment and that breaks ipa-prop.c pattern matching. The following patch fixes that. Thanks, Martin 2009-04-27 Martin Jambor <mjambor@suse.cz> * ipa-prop.c (get_ssa_def_if_simple_copy): New function. (determine_cst_member_ptr): Call get_ssa_def_if_simple_copy to skip simple copies. Index: mine/gcc/ipa-prop.c =================================================================== --- mine.orig/gcc/ipa-prop.c +++ mine/gcc/ipa-prop.c @@ -456,6 +456,22 @@ fill_member_ptr_cst_jump_function (struc jfunc->value.member_cst.delta = delta; } +/* If RHS is an SSA_NAMe and it is defined by a simple copy assign statement, + return the rhs of its defining statement. */ + +static inline tree +get_ssa_def_if_simple_copy (tree rhs) +{ + if (TREE_CODE (rhs) == SSA_NAME && !SSA_NAME_IS_DEFAULT_DEF (rhs)) + { + gimple def_stmt = SSA_NAME_DEF_STMT (rhs); + + if (is_gimple_assign (def_stmt) && gimple_num_ops (def_stmt) == 2) + rhs = gimple_assign_rhs1 (def_stmt); + } + return rhs; +} + /* Traverse statements from CALL backwards, scanning whether the argument ARG which is a member pointer is filled in with constant values. If it is, fill the jump function JFUNC in appropriately. METHOD_FIELD and DELTA_FIELD are @@ -495,6 +511,7 @@ determine_cst_member_ptr (gimple call, t fld = TREE_OPERAND (lhs, 1); if (!method && fld == method_field) { + rhs = get_ssa_def_if_simple_copy (rhs); if (TREE_CODE (rhs) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (rhs, 0)) == FUNCTION_DECL && TREE_CODE (TREE_TYPE (TREE_OPERAND (rhs, 0))) == METHOD_TYPE) @@ -512,6 +529,7 @@ determine_cst_member_ptr (gimple call, t if (!delta && fld == delta_field) { + rhs = get_ssa_def_if_simple_copy (rhs); if (TREE_CODE (rhs) == INTEGER_CST) { delta = rhs; ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA 2009-04-28 10:10 ` [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA Martin Jambor @ 2009-04-28 12:15 ` Richard Guenther 2009-04-29 12:39 ` Martin Jambor 0 siblings, 1 reply; 25+ messages in thread From: Richard Guenther @ 2009-04-28 12:15 UTC (permalink / raw) To: Martin Jambor; +Cc: GCC Patches, Richard Guenther, Jan Hubicka On Tue, Apr 28, 2009 at 12:04 PM, Martin Jambor <mjambor@suse.cz> wrote: > The new intra-SRA produces an extra copy assignment and that breaks > ipa-prop.c pattern matching. The following patch fixes that. > > Thanks, > > Martin > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > * ipa-prop.c (get_ssa_def_if_simple_copy): New function. > (determine_cst_member_ptr): Call get_ssa_def_if_simple_copy to skip > simple copies. > > > Index: mine/gcc/ipa-prop.c > =================================================================== > --- mine.orig/gcc/ipa-prop.c > +++ mine/gcc/ipa-prop.c > @@ -456,6 +456,22 @@ fill_member_ptr_cst_jump_function (struc > jfunc->value.member_cst.delta = delta; > } > > +/* If RHS is an SSA_NAMe and it is defined by a simple copy assign statement, > + return the rhs of its defining statement. */ > + > +static inline tree > +get_ssa_def_if_simple_copy (tree rhs) > +{ > + if (TREE_CODE (rhs) == SSA_NAME && !SSA_NAME_IS_DEFAULT_DEF (rhs)) > + { > + gimple def_stmt = SSA_NAME_DEF_STMT (rhs); > + > + if (is_gimple_assign (def_stmt) && gimple_num_ops (def_stmt) == 2) > + rhs = gimple_assign_rhs1 (def_stmt); > + } > + return rhs; > +} IMHO this function should loop. Also use gimple_assign_single_p instead of the assign && num_ops check. You also have to check the gimple_assign_rhs_code to be SSA_NAME, otherwise you happily look through all unary operations. Richard. > + > /* Traverse statements from CALL backwards, scanning whether the argument ARG > which is a member pointer is filled in with constant values. If it is, fill > the jump function JFUNC in appropriately. METHOD_FIELD and DELTA_FIELD are > @@ -495,6 +511,7 @@ determine_cst_member_ptr (gimple call, t > fld = TREE_OPERAND (lhs, 1); > if (!method && fld == method_field) > { > + rhs = get_ssa_def_if_simple_copy (rhs); > if (TREE_CODE (rhs) == ADDR_EXPR > && TREE_CODE (TREE_OPERAND (rhs, 0)) == FUNCTION_DECL > && TREE_CODE (TREE_TYPE (TREE_OPERAND (rhs, 0))) == METHOD_TYPE) > @@ -512,6 +529,7 @@ determine_cst_member_ptr (gimple call, t > > if (!delta && fld == delta_field) > { > + rhs = get_ssa_def_if_simple_copy (rhs); > if (TREE_CODE (rhs) == INTEGER_CST) > { > delta = rhs; > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA 2009-04-28 12:15 ` Richard Guenther @ 2009-04-29 12:39 ` Martin Jambor 2009-04-29 13:13 ` Richard Guenther 0 siblings, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-04-29 12:39 UTC (permalink / raw) To: Richard Guenther; +Cc: GCC Patches, Richard Guenther, Jan Hubicka Hi, On Tue, Apr 28, 2009 at 01:48:55PM +0200, Richard Guenther wrote: > On Tue, Apr 28, 2009 at 12:04 PM, Martin Jambor <mjambor@suse.cz> wrote: > > The new intra-SRA produces an extra copy assignment and that breaks > > ipa-prop.c pattern matching. The following patch fixes that. > > > > Thanks, > > > > Martin > > > > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > > > * ipa-prop.c (get_ssa_def_if_simple_copy): New function. > > (determine_cst_member_ptr): Call get_ssa_def_if_simple_copy to skip > > simple copies. > > > > > > Index: mine/gcc/ipa-prop.c > > =================================================================== > > --- mine.orig/gcc/ipa-prop.c > > +++ mine/gcc/ipa-prop.c > > @@ -456,6 +456,22 @@ fill_member_ptr_cst_jump_function (struc > > jfunc->value.member_cst.delta = delta; > > } > > > > +/* If RHS is an SSA_NAMe and it is defined by a simple copy assign statement, > > + return the rhs of its defining statement. */ > > + > > +static inline tree > > +get_ssa_def_if_simple_copy (tree rhs) > > +{ > > + if (TREE_CODE (rhs) == SSA_NAME && !SSA_NAME_IS_DEFAULT_DEF (rhs)) > > + { > > + gimple def_stmt = SSA_NAME_DEF_STMT (rhs); > > + > > + if (is_gimple_assign (def_stmt) && gimple_num_ops (def_stmt) == 2) > > + rhs = gimple_assign_rhs1 (def_stmt); > > + } > > + return rhs; > > +} > > IMHO this function should loop. Also use gimple_assign_single_p > instead of the assign && num_ops check. OK > You also have to check the gimple_assign_rhs_code to be SSA_NAME, > otherwise you happily look through all unary operations. > Will the RHS code be SSA_NAME even when the RHS is an invariant? (I am eventually looking for an invariant, specifically an ADDR_EXPR of a FUNCTION_DECL and an integer constant, not an ssa name.) Thanks, Martin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA 2009-04-29 12:39 ` Martin Jambor @ 2009-04-29 13:13 ` Richard Guenther 2009-05-20 10:23 ` Martin Jambor 0 siblings, 1 reply; 25+ messages in thread From: Richard Guenther @ 2009-04-29 13:13 UTC (permalink / raw) To: Martin Jambor; +Cc: Richard Guenther, GCC Patches, Jan Hubicka On Wed, 29 Apr 2009, Martin Jambor wrote: > Hi, > > On Tue, Apr 28, 2009 at 01:48:55PM +0200, Richard Guenther wrote: > > On Tue, Apr 28, 2009 at 12:04 PM, Martin Jambor <mjambor@suse.cz> wrote: > > > The new intra-SRA produces an extra copy assignment and that breaks > > > ipa-prop.c pattern matching. The following patch fixes that. > > > > > > Thanks, > > > > > > Martin > > > > > > > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > > > > > * ipa-prop.c (get_ssa_def_if_simple_copy): New function. > > > (determine_cst_member_ptr): Call get_ssa_def_if_simple_copy to skip > > > simple copies. > > > > > > > > > Index: mine/gcc/ipa-prop.c > > > =================================================================== > > > --- mine.orig/gcc/ipa-prop.c > > > +++ mine/gcc/ipa-prop.c > > > @@ -456,6 +456,22 @@ fill_member_ptr_cst_jump_function (struc > > > jfunc->value.member_cst.delta = delta; > > > } > > > > > > +/* If RHS is an SSA_NAMe and it is defined by a simple copy assign statement, > > > + return the rhs of its defining statement. */ > > > + > > > +static inline tree > > > +get_ssa_def_if_simple_copy (tree rhs) > > > +{ > > > + if (TREE_CODE (rhs) == SSA_NAME && !SSA_NAME_IS_DEFAULT_DEF (rhs)) > > > + { > > > + gimple def_stmt = SSA_NAME_DEF_STMT (rhs); > > > + > > > + if (is_gimple_assign (def_stmt) && gimple_num_ops (def_stmt) == 2) > > > + rhs = gimple_assign_rhs1 (def_stmt); > > > + } > > > + return rhs; > > > +} > > > > IMHO this function should loop. Also use gimple_assign_single_p > > instead of the assign && num_ops check. > > OK > > > You also have to check the gimple_assign_rhs_code to be SSA_NAME, > > otherwise you happily look through all unary operations. > > > > Will the RHS code be SSA_NAME even when the RHS is an invariant? (I am > eventually looking for an invariant, specifically an ADDR_EXPR of a > FUNCTION_DECL and an integer constant, not an ssa name.) No. In that case you want to check if (gimple_assign_single_p (def_stmt) && (gimple_assign_rhs_code (def_stmt) == SSA_NAME || is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt))) Richard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA 2009-04-29 13:13 ` Richard Guenther @ 2009-05-20 10:23 ` Martin Jambor 0 siblings, 0 replies; 25+ messages in thread From: Martin Jambor @ 2009-05-20 10:23 UTC (permalink / raw) To: GCC Patches On Wed, Apr 29, 2009 at 02:57:01PM +0200, Richard Guenther wrote: > On Wed, 29 Apr 2009, Martin Jambor wrote: > > > Hi, > > > > On Tue, Apr 28, 2009 at 01:48:55PM +0200, Richard Guenther wrote: > > > On Tue, Apr 28, 2009 at 12:04 PM, Martin Jambor <mjambor@suse.cz> wrote: > > > > The new intra-SRA produces an extra copy assignment and that breaks > > > > ipa-prop.c pattern matching. The following patch fixes that. > > > > > > > > Thanks, > > > > > > > > Martin > > > > > > > > > > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > > > > > > > * ipa-prop.c (get_ssa_def_if_simple_copy): New function. > > > > (determine_cst_member_ptr): Call get_ssa_def_if_simple_copy to skip > > > > simple copies. > > > > > > > > > > > > Index: mine/gcc/ipa-prop.c > > > > =================================================================== > > > > --- mine.orig/gcc/ipa-prop.c > > > > +++ mine/gcc/ipa-prop.c > > > > @@ -456,6 +456,22 @@ fill_member_ptr_cst_jump_function (struc > > > > jfunc->value.member_cst.delta = delta; > > > > } > > > > > > > > +/* If RHS is an SSA_NAMe and it is defined by a simple copy assign statement, > > > > + return the rhs of its defining statement. */ > > > > + > > > > +static inline tree > > > > +get_ssa_def_if_simple_copy (tree rhs) > > > > +{ > > > > + if (TREE_CODE (rhs) == SSA_NAME && !SSA_NAME_IS_DEFAULT_DEF (rhs)) > > > > + { > > > > + gimple def_stmt = SSA_NAME_DEF_STMT (rhs); > > > > + > > > > + if (is_gimple_assign (def_stmt) && gimple_num_ops (def_stmt) == 2) > > > > + rhs = gimple_assign_rhs1 (def_stmt); > > > > + } > > > > + return rhs; > > > > +} > > > > > > IMHO this function should loop. Also use gimple_assign_single_p > > > instead of the assign && num_ops check. > > > > OK > > > > > You also have to check the gimple_assign_rhs_code to be SSA_NAME, > > > otherwise you happily look through all unary operations. > > > > > > > Will the RHS code be SSA_NAME even when the RHS is an invariant? (I am > > eventually looking for an invariant, specifically an ADDR_EXPR of a > > FUNCTION_DECL and an integer constant, not an ssa name.) > > No. In that case you want to check > > if (gimple_assign_single_p (def_stmt) > && (gimple_assign_rhs_code (def_stmt) == SSA_NAME > || is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt))) > > Richard. The following was approved by Richi on IRC and so I re-bootstrapped, re-tested and committed as revision 147733. Thanks, Martin 2009-05-20 Martin Jambor <mjambor@suse.cz> * ipa-prop.c (get_ssa_def_if_simple_copy): New function. (determine_cst_member_ptr): Call get_ssa_def_if_simple_copy to skip simple copies. Index: mine/gcc/ipa-prop.c =================================================================== --- mine.orig/gcc/ipa-prop.c +++ mine/gcc/ipa-prop.c @@ -428,6 +428,22 @@ fill_member_ptr_cst_jump_function (struc jfunc->value.member_cst.delta = delta; } +/* If RHS is an SSA_NAMe and it is defined by a simple copy assign statement, + return the rhs of its defining statement. */ + +static inline tree +get_ssa_def_if_simple_copy (tree rhs) +{ + while (TREE_CODE (rhs) == SSA_NAME && !SSA_NAME_IS_DEFAULT_DEF (rhs)) + { + gimple def_stmt = SSA_NAME_DEF_STMT (rhs); + + if (gimple_assign_single_p (def_stmt)) + rhs = gimple_assign_rhs1 (def_stmt); + } + return rhs; +} + /* Traverse statements from CALL backwards, scanning whether the argument ARG which is a member pointer is filled in with constant values. If it is, fill the jump function JFUNC in appropriately. METHOD_FIELD and DELTA_FIELD are @@ -467,6 +483,7 @@ determine_cst_member_ptr (gimple call, t fld = TREE_OPERAND (lhs, 1); if (!method && fld == method_field) { + rhs = get_ssa_def_if_simple_copy (rhs); if (TREE_CODE (rhs) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (rhs, 0)) == FUNCTION_DECL && TREE_CODE (TREE_TYPE (TREE_OPERAND (rhs, 0))) == METHOD_TYPE) @@ -484,6 +501,7 @@ determine_cst_member_ptr (gimple call, t if (!delta && fld == delta_field) { + rhs = get_ssa_def_if_simple_copy (rhs); if (TREE_CODE (rhs) == INTEGER_CST) { delta = rhs; ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/5] Make tree-complex.c:extract_component() handle V_C_Es 2009-04-28 10:10 [PATCH 0/5] New implementation of SRA Martin Jambor 2009-04-28 10:10 ` [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA Martin Jambor @ 2009-04-28 10:10 ` Martin Jambor 2009-04-28 11:52 ` Richard Guenther 2009-04-28 10:11 ` [PATCH 1/5] Get rid off old external tree-sra.c stuff Martin Jambor ` (2 subsequent siblings) 4 siblings, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-04-28 10:10 UTC (permalink / raw) To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka [-- Attachment #1: complex_vce.diff --] [-- Type: text/plain, Size: 868 bytes --] Currently tree-complex.c:extract_component() cannot handle VIEW_CONVERT_EXPRs which makes the new SRA ICE during bootstrap (IIRC). This seems to be an ommision so I added a label for this code so that they are handled just like other handled components. Thanks, Martin 2009-04-27 Martin Jambor <mjambor@suse.cz> * tree-complex.c (extract_component): Added VIEW_CONVERT_EXPR switch case. Index: mine/gcc/tree-complex.c =================================================================== --- mine.orig/gcc/tree-complex.c 2009-04-25 19:11:37.000000000 +0200 +++ mine/gcc/tree-complex.c 2009-04-25 19:11:47.000000000 +0200 @@ -601,6 +601,7 @@ extract_component (gimple_stmt_iterator case INDIRECT_REF: case COMPONENT_REF: case ARRAY_REF: + case VIEW_CONVERT_EXPR: { tree inner_type = TREE_TYPE (TREE_TYPE (t)); ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/5] Make tree-complex.c:extract_component() handle V_C_Es 2009-04-28 10:10 ` [PATCH 2/5] Make tree-complex.c:extract_component() handle V_C_Es Martin Jambor @ 2009-04-28 11:52 ` Richard Guenther 2009-05-20 10:20 ` Martin Jambor 0 siblings, 1 reply; 25+ messages in thread From: Richard Guenther @ 2009-04-28 11:52 UTC (permalink / raw) To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka On Tue, 28 Apr 2009, Martin Jambor wrote: > Currently tree-complex.c:extract_component() cannot handle > VIEW_CONVERT_EXPRs which makes the new SRA ICE during bootstrap > (IIRC). This seems to be an ommision so I added a label for this code > so that they are handled just like other handled components. > > Thanks, Ok. Thanks, Richard. > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > * tree-complex.c (extract_component): Added VIEW_CONVERT_EXPR switch > case. > > > Index: mine/gcc/tree-complex.c > =================================================================== > --- mine.orig/gcc/tree-complex.c 2009-04-25 19:11:37.000000000 +0200 > +++ mine/gcc/tree-complex.c 2009-04-25 19:11:47.000000000 +0200 > @@ -601,6 +601,7 @@ extract_component (gimple_stmt_iterator > case INDIRECT_REF: > case COMPONENT_REF: > case ARRAY_REF: > + case VIEW_CONVERT_EXPR: > { > tree inner_type = TREE_TYPE (TREE_TYPE (t)); ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/5] Make tree-complex.c:extract_component() handle V_C_Es 2009-04-28 11:52 ` Richard Guenther @ 2009-05-20 10:20 ` Martin Jambor 0 siblings, 0 replies; 25+ messages in thread From: Martin Jambor @ 2009-05-20 10:20 UTC (permalink / raw) To: GCC Patches On Tue, Apr 28, 2009 at 01:43:16PM +0200, Richard Guenther wrote: > On Tue, 28 Apr 2009, Martin Jambor wrote: > > > Currently tree-complex.c:extract_component() cannot handle > > VIEW_CONVERT_EXPRs which makes the new SRA ICE during bootstrap > > (IIRC). This seems to be an ommision so I added a label for this code > > so that they are handled just like other handled components. > > > > Thanks, > > Ok. > > Thanks, > Richard. Re-bootstrapped, re-tested and committed as revision 147733. Thanks, Martin > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > > > * tree-complex.c (extract_component): Added VIEW_CONVERT_EXPR switch > > case. > > > > > > Index: mine/gcc/tree-complex.c > > =================================================================== > > --- mine.orig/gcc/tree-complex.c 2009-04-25 19:11:37.000000000 +0200 > > +++ mine/gcc/tree-complex.c 2009-04-25 19:11:47.000000000 +0200 > > @@ -601,6 +601,7 @@ extract_component (gimple_stmt_iterator > > case INDIRECT_REF: > > case COMPONENT_REF: > > case ARRAY_REF: > > + case VIEW_CONVERT_EXPR: > > { > > tree inner_type = TREE_TYPE (TREE_TYPE (t)); ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/5] Get rid off old external tree-sra.c stuff 2009-04-28 10:10 [PATCH 0/5] New implementation of SRA Martin Jambor 2009-04-28 10:10 ` [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA Martin Jambor 2009-04-28 10:10 ` [PATCH 2/5] Make tree-complex.c:extract_component() handle V_C_Es Martin Jambor @ 2009-04-28 10:11 ` Martin Jambor 2009-04-28 12:55 ` Richard Guenther 2009-04-28 10:12 ` [PATCH 5/5] "Fix" the rest of the fallouts of new intra-SRA Martin Jambor 2009-04-28 10:14 ` [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates Martin Jambor 4 siblings, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-04-28 10:11 UTC (permalink / raw) To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka [-- Attachment #1: move_insert_edge_copies_seq.diff --] [-- Type: text/plain, Size: 5261 bytes --] This patch gets rid off all extermal things in the old tree-sra. sra_insert_before, sra_insert_after sra_init_cache and sra_type_can_be_decomposed_p are not actually used anywhere so they are made static. insert_edge_copies_seq is used in mudflap and so I copid the function there and made it static too. The original one had to be moved upwards in the file do that tree-sra compiles. Yes, ths patch duplicates the function but the origial copy is nuked with the rest of the file by the next patch. Thanks, Martin 2009-04-27 Martin Jambor <mjambor@suse.cz> * tree-flow.h (insert_edge_copies_seq): Undeclare. (sra_insert_before): Likewise. (sra_insert_after): Likewise. (sra_init_cache): Likewise. (sra_type_can_be_decomposed_p): Likewise. * tree-mudflap.c (insert_edge_copies_seq): Copied here from tree-sra.c * tree-sra.c (sra_type_can_be_decomposed_p): Made static. (sra_insert_before): Likewise. (sra_insert_after): Likewise. (sra_init_cache): Likewise. (insert_edge_copies_seq): Made static and moved upwards. Index: mine/gcc/tree-flow.h =================================================================== --- mine.orig/gcc/tree-flow.h +++ mine/gcc/tree-flow.h @@ -873,13 +873,6 @@ tree vn_lookup_with_vuses (tree, VEC (tr /* In tree-ssa-sink.c */ bool is_hidden_global_store (gimple); -/* In tree-sra.c */ -void insert_edge_copies_seq (gimple_seq, basic_block); -void sra_insert_before (gimple_stmt_iterator *, gimple_seq); -void sra_insert_after (gimple_stmt_iterator *, gimple_seq); -void sra_init_cache (void); -bool sra_type_can_be_decomposed_p (tree); - /* In tree-loop-linear.c */ extern void linear_transform_loops (void); extern unsigned perfect_loop_nest_depth (struct loop *); Index: mine/gcc/tree-mudflap.c =================================================================== --- mine.orig/gcc/tree-mudflap.c +++ mine/gcc/tree-mudflap.c @@ -447,6 +447,26 @@ execute_mudflap_function_ops (void) return 0; } +/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that + if BB has more than one edge, STMT will be replicated for each edge. + Also, abnormal edges will be ignored. */ + +static void +insert_edge_copies_seq (gimple_seq seq, basic_block bb) +{ + edge e; + edge_iterator ei; + unsigned n_copies = -1; + + FOR_EACH_EDGE (e, ei, bb->succs) + if (!(e->flags & EDGE_ABNORMAL)) + n_copies++; + + FOR_EACH_EDGE (e, ei, bb->succs) + if (!(e->flags & EDGE_ABNORMAL)) + gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); +} + /* Create and initialize local shadow variables for the lookup cache globals. Put their decls in the *_l globals for use by mf_build_check_statement_for. */ Index: mine/gcc/tree-sra.c =================================================================== --- mine.orig/gcc/tree-sra.c +++ mine/gcc/tree-sra.c @@ -236,7 +236,7 @@ is_sra_scalar_type (tree type) instantiated, just that if we decide to break up the type into separate pieces that it can be done. */ -bool +static bool sra_type_can_be_decomposed_p (tree type) { unsigned int cache = TYPE_UID (TYPE_MAIN_VARIANT (type)) * 2; @@ -1263,6 +1263,26 @@ build_element_name (struct sra_elt *elt) return XOBFINISH (&sra_obstack, char *); } +/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that + if BB has more than one edge, STMT will be replicated for each edge. + Also, abnormal edges will be ignored. */ + +static void +insert_edge_copies_seq (gimple_seq seq, basic_block bb) +{ + edge e; + edge_iterator ei; + unsigned n_copies = -1; + + FOR_EACH_EDGE (e, ei, bb->succs) + if (!(e->flags & EDGE_ABNORMAL)) + n_copies++; + + FOR_EACH_EDGE (e, ei, bb->succs) + if (!(e->flags & EDGE_ABNORMAL)) + gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); +} + /* Instantiate an element as an independent variable. */ static void @@ -2785,29 +2805,9 @@ generate_element_init (struct sra_elt *e return ret; } -/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that - if BB has more than one edge, STMT will be replicated for each edge. - Also, abnormal edges will be ignored. */ - -void -insert_edge_copies_seq (gimple_seq seq, basic_block bb) -{ - edge e; - edge_iterator ei; - unsigned n_copies = -1; - - FOR_EACH_EDGE (e, ei, bb->succs) - if (!(e->flags & EDGE_ABNORMAL)) - n_copies++; - - FOR_EACH_EDGE (e, ei, bb->succs) - if (!(e->flags & EDGE_ABNORMAL)) - gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); -} - /* Helper function to insert LIST before GSI, and set up line number info. */ -void +static void sra_insert_before (gimple_stmt_iterator *gsi, gimple_seq seq) { gimple stmt = gsi_stmt (*gsi); @@ -2819,7 +2819,7 @@ sra_insert_before (gimple_stmt_iterator /* Similarly, but insert after GSI. Handles insertion onto edges as well. */ -void +static void sra_insert_after (gimple_stmt_iterator *gsi, gimple_seq seq) { gimple stmt = gsi_stmt (*gsi); @@ -3597,7 +3597,7 @@ debug_sra_elt_name (struct sra_elt *elt) fputc ('\n', stderr); } -void +static void sra_init_cache (void) { if (sra_type_decomp_cache) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] Get rid off old external tree-sra.c stuff 2009-04-28 10:11 ` [PATCH 1/5] Get rid off old external tree-sra.c stuff Martin Jambor @ 2009-04-28 12:55 ` Richard Guenther 2009-05-20 10:19 ` Martin Jambor 0 siblings, 1 reply; 25+ messages in thread From: Richard Guenther @ 2009-04-28 12:55 UTC (permalink / raw) To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka On Tue, 28 Apr 2009, Martin Jambor wrote: > This patch gets rid off all extermal things in the old tree-sra. > sra_insert_before, sra_insert_after sra_init_cache and > sra_type_can_be_decomposed_p are not actually used anywhere so they > are made static. insert_edge_copies_seq is used in mudflap and so I > copid the function there and made it static too. The original one had > to be moved upwards in the file do that tree-sra compiles. Yes, ths > patch duplicates the function but the origial copy is nuked with the > rest of the file by the next patch. > > Thanks, > > Martin This is ok. Thanks, Richard. > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > * tree-flow.h (insert_edge_copies_seq): Undeclare. > (sra_insert_before): Likewise. > (sra_insert_after): Likewise. > (sra_init_cache): Likewise. > (sra_type_can_be_decomposed_p): Likewise. > > * tree-mudflap.c (insert_edge_copies_seq): Copied here from tree-sra.c > > * tree-sra.c (sra_type_can_be_decomposed_p): Made static. > (sra_insert_before): Likewise. > (sra_insert_after): Likewise. > (sra_init_cache): Likewise. > (insert_edge_copies_seq): Made static and moved upwards. > > > Index: mine/gcc/tree-flow.h > =================================================================== > --- mine.orig/gcc/tree-flow.h > +++ mine/gcc/tree-flow.h > @@ -873,13 +873,6 @@ tree vn_lookup_with_vuses (tree, VEC (tr > /* In tree-ssa-sink.c */ > bool is_hidden_global_store (gimple); > > -/* In tree-sra.c */ > -void insert_edge_copies_seq (gimple_seq, basic_block); > -void sra_insert_before (gimple_stmt_iterator *, gimple_seq); > -void sra_insert_after (gimple_stmt_iterator *, gimple_seq); > -void sra_init_cache (void); > -bool sra_type_can_be_decomposed_p (tree); > - > /* In tree-loop-linear.c */ > extern void linear_transform_loops (void); > extern unsigned perfect_loop_nest_depth (struct loop *); > Index: mine/gcc/tree-mudflap.c > =================================================================== > --- mine.orig/gcc/tree-mudflap.c > +++ mine/gcc/tree-mudflap.c > @@ -447,6 +447,26 @@ execute_mudflap_function_ops (void) > return 0; > } > > +/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that > + if BB has more than one edge, STMT will be replicated for each edge. > + Also, abnormal edges will be ignored. */ > + > +static void > +insert_edge_copies_seq (gimple_seq seq, basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + unsigned n_copies = -1; > + > + FOR_EACH_EDGE (e, ei, bb->succs) > + if (!(e->flags & EDGE_ABNORMAL)) > + n_copies++; > + > + FOR_EACH_EDGE (e, ei, bb->succs) > + if (!(e->flags & EDGE_ABNORMAL)) > + gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); > +} > + > /* Create and initialize local shadow variables for the lookup cache > globals. Put their decls in the *_l globals for use by > mf_build_check_statement_for. */ > Index: mine/gcc/tree-sra.c > =================================================================== > --- mine.orig/gcc/tree-sra.c > +++ mine/gcc/tree-sra.c > @@ -236,7 +236,7 @@ is_sra_scalar_type (tree type) > instantiated, just that if we decide to break up the type into > separate pieces that it can be done. */ > > -bool > +static bool > sra_type_can_be_decomposed_p (tree type) > { > unsigned int cache = TYPE_UID (TYPE_MAIN_VARIANT (type)) * 2; > @@ -1263,6 +1263,26 @@ build_element_name (struct sra_elt *elt) > return XOBFINISH (&sra_obstack, char *); > } > > +/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that > + if BB has more than one edge, STMT will be replicated for each edge. > + Also, abnormal edges will be ignored. */ > + > +static void > +insert_edge_copies_seq (gimple_seq seq, basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + unsigned n_copies = -1; > + > + FOR_EACH_EDGE (e, ei, bb->succs) > + if (!(e->flags & EDGE_ABNORMAL)) > + n_copies++; > + > + FOR_EACH_EDGE (e, ei, bb->succs) > + if (!(e->flags & EDGE_ABNORMAL)) > + gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); > +} > + > /* Instantiate an element as an independent variable. */ > > static void > @@ -2785,29 +2805,9 @@ generate_element_init (struct sra_elt *e > return ret; > } > > -/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that > - if BB has more than one edge, STMT will be replicated for each edge. > - Also, abnormal edges will be ignored. */ > - > -void > -insert_edge_copies_seq (gimple_seq seq, basic_block bb) > -{ > - edge e; > - edge_iterator ei; > - unsigned n_copies = -1; > - > - FOR_EACH_EDGE (e, ei, bb->succs) > - if (!(e->flags & EDGE_ABNORMAL)) > - n_copies++; > - > - FOR_EACH_EDGE (e, ei, bb->succs) > - if (!(e->flags & EDGE_ABNORMAL)) > - gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); > -} > - > /* Helper function to insert LIST before GSI, and set up line number info. */ > > -void > +static void > sra_insert_before (gimple_stmt_iterator *gsi, gimple_seq seq) > { > gimple stmt = gsi_stmt (*gsi); > @@ -2819,7 +2819,7 @@ sra_insert_before (gimple_stmt_iterator > > /* Similarly, but insert after GSI. Handles insertion onto edges as well. */ > > -void > +static void > sra_insert_after (gimple_stmt_iterator *gsi, gimple_seq seq) > { > gimple stmt = gsi_stmt (*gsi); > @@ -3597,7 +3597,7 @@ debug_sra_elt_name (struct sra_elt *elt) > fputc ('\n', stderr); > } > > -void > +static void > sra_init_cache (void) > { > if (sra_type_decomp_cache) > > -- Richard Guenther <rguenther@suse.de> Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/5] Get rid off old external tree-sra.c stuff 2009-04-28 12:55 ` Richard Guenther @ 2009-05-20 10:19 ` Martin Jambor 0 siblings, 0 replies; 25+ messages in thread From: Martin Jambor @ 2009-05-20 10:19 UTC (permalink / raw) To: GCC Patches On Tue, Apr 28, 2009 at 02:46:45PM +0200, Richard Guenther wrote: > On Tue, 28 Apr 2009, Martin Jambor wrote: > > > This patch gets rid off all extermal things in the old tree-sra. > > sra_insert_before, sra_insert_after sra_init_cache and > > sra_type_can_be_decomposed_p are not actually used anywhere so they > > are made static. insert_edge_copies_seq is used in mudflap and so I > > copid the function there and made it static too. The original one had > > to be moved upwards in the file do that tree-sra compiles. Yes, ths > > patch duplicates the function but the origial copy is nuked with the > > rest of the file by the next patch. > > > > Thanks, > > > > Martin > > This is ok. Re-bootstrapped, re-tested and committed as revision 147733. Thanks, Martin > > Thanks, > Richard. > > > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > > > * tree-flow.h (insert_edge_copies_seq): Undeclare. > > (sra_insert_before): Likewise. > > (sra_insert_after): Likewise. > > (sra_init_cache): Likewise. > > (sra_type_can_be_decomposed_p): Likewise. > > > > * tree-mudflap.c (insert_edge_copies_seq): Copied here from tree-sra.c > > > > * tree-sra.c (sra_type_can_be_decomposed_p): Made static. > > (sra_insert_before): Likewise. > > (sra_insert_after): Likewise. > > (sra_init_cache): Likewise. > > (insert_edge_copies_seq): Made static and moved upwards. > > > > > > Index: mine/gcc/tree-flow.h > > =================================================================== > > --- mine.orig/gcc/tree-flow.h > > +++ mine/gcc/tree-flow.h > > @@ -873,13 +873,6 @@ tree vn_lookup_with_vuses (tree, VEC (tr > > /* In tree-ssa-sink.c */ > > bool is_hidden_global_store (gimple); > > > > -/* In tree-sra.c */ > > -void insert_edge_copies_seq (gimple_seq, basic_block); > > -void sra_insert_before (gimple_stmt_iterator *, gimple_seq); > > -void sra_insert_after (gimple_stmt_iterator *, gimple_seq); > > -void sra_init_cache (void); > > -bool sra_type_can_be_decomposed_p (tree); > > - > > /* In tree-loop-linear.c */ > > extern void linear_transform_loops (void); > > extern unsigned perfect_loop_nest_depth (struct loop *); > > Index: mine/gcc/tree-mudflap.c > > =================================================================== > > --- mine.orig/gcc/tree-mudflap.c > > +++ mine/gcc/tree-mudflap.c > > @@ -447,6 +447,26 @@ execute_mudflap_function_ops (void) > > return 0; > > } > > > > +/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that > > + if BB has more than one edge, STMT will be replicated for each edge. > > + Also, abnormal edges will be ignored. */ > > + > > +static void > > +insert_edge_copies_seq (gimple_seq seq, basic_block bb) > > +{ > > + edge e; > > + edge_iterator ei; > > + unsigned n_copies = -1; > > + > > + FOR_EACH_EDGE (e, ei, bb->succs) > > + if (!(e->flags & EDGE_ABNORMAL)) > > + n_copies++; > > + > > + FOR_EACH_EDGE (e, ei, bb->succs) > > + if (!(e->flags & EDGE_ABNORMAL)) > > + gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); > > +} > > + > > /* Create and initialize local shadow variables for the lookup cache > > globals. Put their decls in the *_l globals for use by > > mf_build_check_statement_for. */ > > Index: mine/gcc/tree-sra.c > > =================================================================== > > --- mine.orig/gcc/tree-sra.c > > +++ mine/gcc/tree-sra.c > > @@ -236,7 +236,7 @@ is_sra_scalar_type (tree type) > > instantiated, just that if we decide to break up the type into > > separate pieces that it can be done. */ > > > > -bool > > +static bool > > sra_type_can_be_decomposed_p (tree type) > > { > > unsigned int cache = TYPE_UID (TYPE_MAIN_VARIANT (type)) * 2; > > @@ -1263,6 +1263,26 @@ build_element_name (struct sra_elt *elt) > > return XOBFINISH (&sra_obstack, char *); > > } > > > > +/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that > > + if BB has more than one edge, STMT will be replicated for each edge. > > + Also, abnormal edges will be ignored. */ > > + > > +static void > > +insert_edge_copies_seq (gimple_seq seq, basic_block bb) > > +{ > > + edge e; > > + edge_iterator ei; > > + unsigned n_copies = -1; > > + > > + FOR_EACH_EDGE (e, ei, bb->succs) > > + if (!(e->flags & EDGE_ABNORMAL)) > > + n_copies++; > > + > > + FOR_EACH_EDGE (e, ei, bb->succs) > > + if (!(e->flags & EDGE_ABNORMAL)) > > + gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); > > +} > > + > > /* Instantiate an element as an independent variable. */ > > > > static void > > @@ -2785,29 +2805,9 @@ generate_element_init (struct sra_elt *e > > return ret; > > } > > > > -/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that > > - if BB has more than one edge, STMT will be replicated for each edge. > > - Also, abnormal edges will be ignored. */ > > - > > -void > > -insert_edge_copies_seq (gimple_seq seq, basic_block bb) > > -{ > > - edge e; > > - edge_iterator ei; > > - unsigned n_copies = -1; > > - > > - FOR_EACH_EDGE (e, ei, bb->succs) > > - if (!(e->flags & EDGE_ABNORMAL)) > > - n_copies++; > > - > > - FOR_EACH_EDGE (e, ei, bb->succs) > > - if (!(e->flags & EDGE_ABNORMAL)) > > - gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); > > -} > > - > > /* Helper function to insert LIST before GSI, and set up line number info. */ > > > > -void > > +static void > > sra_insert_before (gimple_stmt_iterator *gsi, gimple_seq seq) > > { > > gimple stmt = gsi_stmt (*gsi); > > @@ -2819,7 +2819,7 @@ sra_insert_before (gimple_stmt_iterator > > > > /* Similarly, but insert after GSI. Handles insertion onto edges as well. */ > > > > -void > > +static void > > sra_insert_after (gimple_stmt_iterator *gsi, gimple_seq seq) > > { > > gimple stmt = gsi_stmt (*gsi); > > @@ -3597,7 +3597,7 @@ debug_sra_elt_name (struct sra_elt *elt) > > fputc ('\n', stderr); > > } > > > > -void > > +static void > > sra_init_cache (void) > > { > > if (sra_type_decomp_cache) > > > > > > -- > Richard Guenther <rguenther@suse.de> > Novell / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 5/5] "Fix" the rest of the fallouts of new intra-SRA 2009-04-28 10:10 [PATCH 0/5] New implementation of SRA Martin Jambor ` (2 preceding siblings ...) 2009-04-28 10:11 ` [PATCH 1/5] Get rid off old external tree-sra.c stuff Martin Jambor @ 2009-04-28 10:12 ` Martin Jambor 2009-04-28 13:05 ` Richard Guenther 2009-04-28 10:14 ` [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates Martin Jambor 4 siblings, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-04-28 10:12 UTC (permalink / raw) To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka [-- Attachment #1: testcase_fixes.diff --] [-- Type: text/plain, Size: 3075 bytes --] The following patch amends testcases (rather than the compiler) so that they not fail with the new intra-SRA. The FRE testcases rely on the fact that SRA does not scalarize unions. The new one, however, does. Therefore I simply switched SRA off for them. Hopefully that is the correct thing to do. The gfortran.dg/pr25923.f90 expects a weird warning that I do not give out because the new SRA does not scalarize anything in that particular testcase because the individual fields of the structure are never accessed individually. An unpatched compiler with -fno-tree-sra also does not give any warning. I believe it was left there as potentially uselful and not misleading but I tent to believe it's not really required... With the previous and this patch, there are no regressions in the testsuite on x86_64-linux-gnu. I have even bootstrapped and tested Ada. Thanks, Martin 2009-04-27 Martin Jambor <mjambor@suse.cz> * gfortran.dg/pr25923.f90: Remove warning expectation. * gcc.dg/tree-ssa/ssa-fre-7.c: Compile with -fno-tree-sra. * gcc.dg/tree-ssa/ssa-fre-8.c: Likewise. * gcc.dg/tree-ssa/ssa-fre-9.c: Likewise. Index: mine/gcc/testsuite/gfortran.dg/pr25923.f90 =================================================================== --- mine.orig/gcc/testsuite/gfortran.dg/pr25923.f90 +++ mine/gcc/testsuite/gfortran.dg/pr25923.f90 @@ -10,7 +10,7 @@ implicit none contains - function baz(arg) result(res) ! { dg-warning "res.yr' may be" } + function baz(arg) result(res) type(bar), intent(in) :: arg type(bar) :: res logical, external:: some_func Index: mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-7.c =================================================================== --- mine.orig/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-7.c +++ mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-7.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O -fdump-tree-fre-details -fdump-tree-optimized" } */ +/* { dg-options "-O -fno-tree-sra -fdump-tree-fre-details -fdump-tree-optimized" } */ #if (__SIZEOF_INT__ == __SIZEOF_FLOAT__) typedef int intflt; #elif (__SIZEOF_LONG__ == __SIZEOF_FLOAT__) Index: mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-8.c =================================================================== --- mine.orig/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-8.c +++ mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-8.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O -fdump-tree-fre-details" } */ +/* { dg-options "-O -fno-tree-sra -fdump-tree-fre-details" } */ #if (__SIZEOF_INT__ == __SIZEOF_FLOAT__) typedef int intflt; #elif (__SIZEOF_LONG__ == __SIZEOF_FLOAT__) Index: mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-9.c =================================================================== --- mine.orig/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-9.c +++ mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-9.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O -fdump-tree-fre-stats" } */ +/* { dg-options "-O -fno-tree-sra -fdump-tree-fre-stats" } */ union loc { unsigned reg; ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 5/5] "Fix" the rest of the fallouts of new intra-SRA 2009-04-28 10:12 ` [PATCH 5/5] "Fix" the rest of the fallouts of new intra-SRA Martin Jambor @ 2009-04-28 13:05 ` Richard Guenther 0 siblings, 0 replies; 25+ messages in thread From: Richard Guenther @ 2009-04-28 13:05 UTC (permalink / raw) To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka, fortran On Tue, 28 Apr 2009, Martin Jambor wrote: > The following patch amends testcases (rather than the compiler) so > that they not fail with the new intra-SRA. > > The FRE testcases rely on the fact that SRA does not scalarize unions. > The new one, however, does. Therefore I simply switched SRA off for > them. Hopefully that is the correct thing to do. > > The gfortran.dg/pr25923.f90 expects a weird warning that I do not give > out because the new SRA does not scalarize anything in that particular > testcase because the individual fields of the structure are never > accessed individually. An unpatched compiler with -fno-tree-sra also > does not give any warning. I believe it was left there as potentially > uselful and not misleading but I tent to believe it's not really > required... Hm. The PR was about garbled warning messages and the warning itself does look correct (and useful). But it indeed does not make sense for SRA to scalarize the return value. Thus I think it is more appropriate to XFAIL the warning. > With the previous and this patch, there are no regressions in the > testsuite on x86_64-linux-gnu. I have even bootstrapped and tested > Ada. The FRE testcase changes are ok. XFAILing the fortran testcase is also ok if there are no objections from the Fortran maintainers (CCed). Thanks, Richard. > Thanks, > > Martin > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > * gfortran.dg/pr25923.f90: Remove warning expectation. > * gcc.dg/tree-ssa/ssa-fre-7.c: Compile with -fno-tree-sra. > * gcc.dg/tree-ssa/ssa-fre-8.c: Likewise. > * gcc.dg/tree-ssa/ssa-fre-9.c: Likewise. > > > Index: mine/gcc/testsuite/gfortran.dg/pr25923.f90 > =================================================================== > --- mine.orig/gcc/testsuite/gfortran.dg/pr25923.f90 > +++ mine/gcc/testsuite/gfortran.dg/pr25923.f90 > @@ -10,7 +10,7 @@ implicit none > > contains > > - function baz(arg) result(res) ! { dg-warning "res.yr' may be" } > + function baz(arg) result(res) > type(bar), intent(in) :: arg > type(bar) :: res > logical, external:: some_func > Index: mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-7.c > =================================================================== > --- mine.orig/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-7.c > +++ mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-7.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O -fdump-tree-fre-details -fdump-tree-optimized" } */ > +/* { dg-options "-O -fno-tree-sra -fdump-tree-fre-details -fdump-tree-optimized" } */ > #if (__SIZEOF_INT__ == __SIZEOF_FLOAT__) > typedef int intflt; > #elif (__SIZEOF_LONG__ == __SIZEOF_FLOAT__) > Index: mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-8.c > =================================================================== > --- mine.orig/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-8.c > +++ mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-8.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O -fdump-tree-fre-details" } */ > +/* { dg-options "-O -fno-tree-sra -fdump-tree-fre-details" } */ > #if (__SIZEOF_INT__ == __SIZEOF_FLOAT__) > typedef int intflt; > #elif (__SIZEOF_LONG__ == __SIZEOF_FLOAT__) > Index: mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-9.c > =================================================================== > --- mine.orig/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-9.c > +++ mine/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-9.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O -fdump-tree-fre-stats" } */ > +/* { dg-options "-O -fno-tree-sra -fdump-tree-fre-stats" } */ > > union loc { > unsigned reg; > > -- Richard Guenther <rguenther@suse.de> Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-04-28 10:10 [PATCH 0/5] New implementation of SRA Martin Jambor ` (3 preceding siblings ...) 2009-04-28 10:12 ` [PATCH 5/5] "Fix" the rest of the fallouts of new intra-SRA Martin Jambor @ 2009-04-28 10:14 ` Martin Jambor 2009-04-28 10:27 ` Martin Jambor 2009-04-29 10:59 ` Richard Guenther 4 siblings, 2 replies; 25+ messages in thread From: Martin Jambor @ 2009-04-28 10:14 UTC (permalink / raw) To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka [-- Attachment #1: trunk_intra_sra.diff --] [-- Type: text/plain, Size: 181047 bytes --] This is the new intraprocedural SRA. I have stripped off the interprocedural part and will propose to commit it separately later. I have tried to remove almost every trace of IPA-SRA, however, two provisions for it have remained in the patch. First, an enumeration (rather than a boolean) is used to distuinguish between "early" and "late" SRA so that other SRA modes can be added later on. Second, scan_function() has a hook parameter and a void pointer parameter which are not used in this patch but will be by IPA-SRA. Otherwise, the patch is hopefully self-contained and the bases of its operation is described by the initial comment. The patch bootstraps (on x86_64-linux-gnu but I am about to try it on hppa-linux-gnu too) but produces a small number of testsuite failures which are handled by the two following patches. Thanks, Martin 2009-04-27 Martin Jambor <mjambor@suse.cz> * tree-sra.c (enum sra_mode): The whole contents of the file was replaced. Index: mine/gcc/tree-sra.c =================================================================== --- mine.orig/gcc/tree-sra.c +++ mine/gcc/tree-sra.c @@ -1,19 +1,18 @@ /* Scalar Replacement of Aggregates (SRA) converts some structure references into scalar references, exposing them to the scalar optimizers. - Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009 - Free Software Foundation, Inc. - Contributed by Diego Novillo <dnovillo@redhat.com> + Copyright (C) 2008, 2009 Free Software Foundation, Inc. + Contributed by Martin Jambor <mjambor@suse.cz> This file is part of GCC. -GCC is free software; you can redistribute it and/or modify it -under the terms of the GNU General Public License as published by the -Free Software Foundation; either version 3, or (at your option) any -later version. +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. -GCC is distributed in the hope that it will be useful, but WITHOUT -ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. @@ -21,3656 +20,2436 @@ You should have received a copy of the G along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ +/* This file implements Scalar Reduction of Aggregates (SRA). SRA is run + twice, once in the early stages of compilation (early SRA) and once in the + late stages (late SRA). The aim of both is to turn references to scalar + parts of aggregates into uses of independent scalar variables. + + The two passes are nearly identical, the only difference is that early SRA + does not scalarize unions which are used as the result in a GIMPLE_RETURN + statement because together with inlining this can lead to weird type + conversions. + + Both passes operate in four stages: + + 1. The declarations that have properties which make them candidates for + scalarization are identified in function find_var_candidates(). The + candidates are stored in candidate_bitmap. + + 2. The function body is scanned. In the process, declarations which are + used in a manner that prevent their scalarization are removed from the + candidate bitmap. More importantly, for every access into an aggregate, + an access structure (struct access) is created by create_access() and + stored in a vector associated with the aggregate. Among other + information, the aggregate declaration, the offset and size of the access + and its type are stored in the structure. + + On a related note, assign_link structures are created for every assign + statement between candidate aggregates and attached to the related + accesses. + + 3. The vectors of accesses are analyzed. They are first sorted according to + their offset and size and then scanned for partially overlapping accesses + (i.e. those which overlap but one is not entirely within another). Such + an access disqualifies the whole aggregate from being scalarized. + + If there is no such inhibiting overlap, a representative access structure + is chosen for every unique combination of offset and size. Afterwards, + the pass builds a set of trees from these structures, in which children + of an access are within their parent (in terms of offset and size). + + Then accesses are propagated whenever possible (i.e. in cases when it + does not create a partially overlapping access) across assign_links from + the right hand side to the left hand side. + + Then the set of trees for each declaration is traversed again and those + accesses which should be replaced by a scalar are identified. + + 4. The function is traversed again, and for every reference into an + aggregate that has some component which is about to be scalarized, + statements are amended and new statements are created as necessary. + Finally, if a parameter got scalarized, the scalar replacements are + initialized with values from respective parameter aggregates. +*/ + #include "config.h" #include "system.h" #include "coretypes.h" +#include "alloc-pool.h" #include "tm.h" -#include "ggc.h" #include "tree.h" - -/* These RTL headers are needed for basic-block.h. */ -#include "rtl.h" -#include "tm_p.h" -#include "hard-reg-set.h" -#include "basic-block.h" -#include "diagnostic.h" -#include "langhooks.h" -#include "tree-inline.h" -#include "tree-flow.h" #include "gimple.h" +#include "tree-flow.h" +#include "diagnostic.h" #include "tree-dump.h" -#include "tree-pass.h" #include "timevar.h" -#include "flags.h" -#include "bitmap.h" -#include "obstack.h" -#include "target.h" -/* expr.h is needed for MOVE_RATIO. */ -#include "expr.h" #include "params.h" +#include "target.h" +#include "flags.h" - -/* This object of this pass is to replace a non-addressable aggregate with a - set of independent variables. Most of the time, all of these variables - will be scalars. But a secondary objective is to break up larger - aggregates into smaller aggregates. In the process we may find that some - bits of the larger aggregate can be deleted as unreferenced. - - This substitution is done globally. More localized substitutions would - be the purvey of a load-store motion pass. - - The optimization proceeds in phases: - - (1) Identify variables that have types that are candidates for - decomposition. - - (2) Scan the function looking for the ways these variables are used. - In particular we're interested in the number of times a variable - (or member) is needed as a complete unit, and the number of times - a variable (or member) is copied. - - (3) Based on the usage profile, instantiate substitution variables. - - (4) Scan the function making replacements. +/* Enumeration of all aggregate reductions we can do. */ +enum sra_mode {SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ + SRA_MODE_INTRA}; /* late intraprocedural SRA */ + +/* Global variable describing which aggregate reduction we are performing at + the moment. */ +static enum sra_mode sra_mode; + +struct assign_link; + +/* ACCESS represents each access to an aggregate variable (as a whole or a + part). It can also represent a group of accesses that refer to exactly the + same fragment of an aggregate (i.e. those that have exactly the same offset + and size). Such representatives for a single aggregate, once determined, + are linked in a linked list and have the group fields set. + + Moreover, when doing intraprocedural SRA, a tree is built from those + representatives (by the means of first_child and next_sibling pointers), in + which all items in a subtree are "within" the root, i.e. their offset is + greater or equal to offset of the root and offset+size is smaller or equal + to offset+size of the root. Children of an access are sorted by offset. */ - -/* True if this is the "early" pass, before inlining. */ -static bool early_sra; - -/* The set of aggregate variables that are candidates for scalarization. */ -static bitmap sra_candidates; - -/* Set of scalarizable PARM_DECLs that need copy-in operations at the - beginning of the function. */ -static bitmap needs_copy_in; - -/* Sets of bit pairs that cache type decomposition and instantiation. */ -static bitmap sra_type_decomp_cache; -static bitmap sra_type_inst_cache; - -/* One of these structures is created for each candidate aggregate and - each (accessed) member or group of members of such an aggregate. */ -struct sra_elt -{ - /* A tree of the elements. Used when we want to traverse everything. */ - struct sra_elt *parent; - struct sra_elt *groups; - struct sra_elt *children; - struct sra_elt *sibling; - - /* If this element is a root, then this is the VAR_DECL. If this is - a sub-element, this is some token used to identify the reference. - In the case of COMPONENT_REF, this is the FIELD_DECL. In the case - of an ARRAY_REF, this is the (constant) index. In the case of an - ARRAY_RANGE_REF, this is the (constant) RANGE_EXPR. In the case - of a complex number, this is a zero or one. */ - tree element; - - /* The type of the element. */ - tree type; - - /* A VAR_DECL, for any sub-element we've decided to replace. */ - tree replacement; - - /* The number of times the element is referenced as a whole. I.e. - given "a.b.c", this would be incremented for C, but not for A or B. */ - unsigned int n_uses; - - /* The number of times the element is copied to or from another - scalarizable element. */ - unsigned int n_copies; - - /* True if TYPE is scalar. */ - bool is_scalar; - - /* True if this element is a group of members of its parent. */ - bool is_group; - - /* True if we saw something about this element that prevents scalarization, - such as non-constant indexing. */ - bool cannot_scalarize; - - /* True if we've decided that structure-to-structure assignment - should happen via memcpy and not per-element. */ - bool use_block_copy; - - /* True if everything under this element has been marked TREE_NO_WARNING. */ - bool all_no_warning; - - /* A flag for use with/after random access traversals. */ - bool visited; - - /* True if there is BIT_FIELD_REF on the lhs with a vector. */ - bool is_vector_lhs; - - /* 1 if the element is a field that is part of a block, 2 if the field - is the block itself, 0 if it's neither. */ - char in_bitfld_block; -}; - -#define IS_ELEMENT_FOR_GROUP(ELEMENT) (TREE_CODE (ELEMENT) == RANGE_EXPR) - -#define FOR_EACH_ACTUAL_CHILD(CHILD, ELT) \ - for ((CHILD) = (ELT)->is_group \ - ? next_child_for_group (NULL, (ELT)) \ - : (ELT)->children; \ - (CHILD); \ - (CHILD) = (ELT)->is_group \ - ? next_child_for_group ((CHILD), (ELT)) \ - : (CHILD)->sibling) - -/* Helper function for above macro. Return next child in group. */ -static struct sra_elt * -next_child_for_group (struct sra_elt *child, struct sra_elt *group) -{ - gcc_assert (group->is_group); - - /* Find the next child in the parent. */ - if (child) - child = child->sibling; - else - child = group->parent->children; - - /* Skip siblings that do not belong to the group. */ - while (child) - { - tree g_elt = group->element; - if (TREE_CODE (g_elt) == RANGE_EXPR) - { - if (!tree_int_cst_lt (child->element, TREE_OPERAND (g_elt, 0)) - && !tree_int_cst_lt (TREE_OPERAND (g_elt, 1), child->element)) - break; - } - else - gcc_unreachable (); - - child = child->sibling; - } - - return child; -} - -/* Random access to the child of a parent is performed by hashing. - This prevents quadratic behavior, and allows SRA to function - reasonably on larger records. */ -static htab_t sra_map; - -/* All structures are allocated out of the following obstack. */ -static struct obstack sra_obstack; - -/* Debugging functions. */ -static void dump_sra_elt_name (FILE *, struct sra_elt *); -extern void debug_sra_elt_name (struct sra_elt *); - -/* Forward declarations. */ -static tree generate_element_ref (struct sra_elt *); -static gimple_seq sra_build_assignment (tree dst, tree src); -static void mark_all_v_defs_seq (gimple_seq); - -\f -/* Return true if DECL is an SRA candidate. */ - -static bool -is_sra_candidate_decl (tree decl) -{ - return DECL_P (decl) && bitmap_bit_p (sra_candidates, DECL_UID (decl)); -} - -/* Return true if TYPE is a scalar type. */ - -static bool -is_sra_scalar_type (tree type) -{ - enum tree_code code = TREE_CODE (type); - return (code == INTEGER_TYPE || code == REAL_TYPE || code == VECTOR_TYPE - || code == FIXED_POINT_TYPE - || code == ENUMERAL_TYPE || code == BOOLEAN_TYPE - || code == POINTER_TYPE || code == OFFSET_TYPE - || code == REFERENCE_TYPE); -} - -/* Return true if TYPE can be decomposed into a set of independent variables. - - Note that this doesn't imply that all elements of TYPE can be - instantiated, just that if we decide to break up the type into - separate pieces that it can be done. */ - -static bool -sra_type_can_be_decomposed_p (tree type) -{ - unsigned int cache = TYPE_UID (TYPE_MAIN_VARIANT (type)) * 2; - tree t; - - /* Avoid searching the same type twice. */ - if (bitmap_bit_p (sra_type_decomp_cache, cache+0)) - return true; - if (bitmap_bit_p (sra_type_decomp_cache, cache+1)) - return false; - - /* The type must have a definite nonzero size. */ - if (TYPE_SIZE (type) == NULL || TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST - || integer_zerop (TYPE_SIZE (type))) - goto fail; - - /* The type must be a non-union aggregate. */ - switch (TREE_CODE (type)) - { - case RECORD_TYPE: - { - bool saw_one_field = false; - - for (t = TYPE_FIELDS (type); t ; t = TREE_CHAIN (t)) - if (TREE_CODE (t) == FIELD_DECL) - { - /* Reject incorrectly represented bit fields. */ - if (DECL_BIT_FIELD (t) - && INTEGRAL_TYPE_P (TREE_TYPE (t)) - && (tree_low_cst (DECL_SIZE (t), 1) - != TYPE_PRECISION (TREE_TYPE (t)))) - goto fail; - - saw_one_field = true; - } - - /* Record types must have at least one field. */ - if (!saw_one_field) - goto fail; - } - break; - - case ARRAY_TYPE: - /* Array types must have a fixed lower and upper bound. */ - t = TYPE_DOMAIN (type); - if (t == NULL) - goto fail; - if (TYPE_MIN_VALUE (t) == NULL || !TREE_CONSTANT (TYPE_MIN_VALUE (t))) - goto fail; - if (TYPE_MAX_VALUE (t) == NULL || !TREE_CONSTANT (TYPE_MAX_VALUE (t))) - goto fail; - break; - - case COMPLEX_TYPE: - break; - - default: - goto fail; - } - - bitmap_set_bit (sra_type_decomp_cache, cache+0); - return true; - - fail: - bitmap_set_bit (sra_type_decomp_cache, cache+1); - return false; -} - -/* Returns true if the TYPE is one of the available va_list types. - Otherwise it returns false. - Note, that for multiple calling conventions there can be more - than just one va_list type present. */ - -static bool -is_va_list_type (tree type) +struct access { - tree h; - - if (type == NULL_TREE) - return false; - h = targetm.canonical_va_list_type (type); - if (h == NULL_TREE) - return false; - if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (h)) - return true; - return false; -} - -/* Return true if DECL can be decomposed into a set of independent - (though not necessarily scalar) variables. */ - -static bool -decl_can_be_decomposed_p (tree var) -{ - /* Early out for scalars. */ - if (is_sra_scalar_type (TREE_TYPE (var))) - return false; - - /* The variable must not be aliased. */ - if (!is_gimple_non_addressable (var)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - { - fprintf (dump_file, "Cannot scalarize variable "); - print_generic_expr (dump_file, var, dump_flags); - fprintf (dump_file, " because it must live in memory\n"); - } - return false; - } - - /* The variable must not be volatile. */ - if (TREE_THIS_VOLATILE (var)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - { - fprintf (dump_file, "Cannot scalarize variable "); - print_generic_expr (dump_file, var, dump_flags); - fprintf (dump_file, " because it is declared volatile\n"); - } - return false; - } - - /* We must be able to decompose the variable's type. */ - if (!sra_type_can_be_decomposed_p (TREE_TYPE (var))) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - { - fprintf (dump_file, "Cannot scalarize variable "); - print_generic_expr (dump_file, var, dump_flags); - fprintf (dump_file, " because its type cannot be decomposed\n"); - } - return false; - } - - /* HACK: if we decompose a va_list_type_node before inlining, then we'll - confuse tree-stdarg.c, and we won't be able to figure out which and - how many arguments are accessed. This really should be improved in - tree-stdarg.c, as the decomposition is truly a win. This could also - be fixed if the stdarg pass ran early, but this can't be done until - we've aliasing information early too. See PR 30791. */ - if (early_sra && is_va_list_type (TREE_TYPE (var))) - return false; - - return true; -} - -/* Return true if TYPE can be *completely* decomposed into scalars. */ - -static bool -type_can_instantiate_all_elements (tree type) -{ - if (is_sra_scalar_type (type)) - return true; - if (!sra_type_can_be_decomposed_p (type)) - return false; - - switch (TREE_CODE (type)) - { - case RECORD_TYPE: - { - unsigned int cache = TYPE_UID (TYPE_MAIN_VARIANT (type)) * 2; - tree f; - - if (bitmap_bit_p (sra_type_inst_cache, cache+0)) - return true; - if (bitmap_bit_p (sra_type_inst_cache, cache+1)) - return false; - - for (f = TYPE_FIELDS (type); f ; f = TREE_CHAIN (f)) - if (TREE_CODE (f) == FIELD_DECL) - { - if (!type_can_instantiate_all_elements (TREE_TYPE (f))) - { - bitmap_set_bit (sra_type_inst_cache, cache+1); - return false; - } - } - - bitmap_set_bit (sra_type_inst_cache, cache+0); - return true; - } - - case ARRAY_TYPE: - return type_can_instantiate_all_elements (TREE_TYPE (type)); - - case COMPLEX_TYPE: - return true; - - default: - gcc_unreachable (); - } -} - -/* Test whether ELT or some sub-element cannot be scalarized. */ - -static bool -can_completely_scalarize_p (struct sra_elt *elt) -{ - struct sra_elt *c; - - if (elt->cannot_scalarize) - return false; - - for (c = elt->children; c; c = c->sibling) - if (!can_completely_scalarize_p (c)) - return false; - - for (c = elt->groups; c; c = c->sibling) - if (!can_completely_scalarize_p (c)) - return false; - - return true; -} - -\f -/* A simplified tree hashing algorithm that only handles the types of - trees we expect to find in sra_elt->element. */ - -static hashval_t -sra_hash_tree (tree t) -{ - hashval_t h; - - switch (TREE_CODE (t)) - { - case VAR_DECL: - case PARM_DECL: - case RESULT_DECL: - h = DECL_UID (t); - break; - - case INTEGER_CST: - h = TREE_INT_CST_LOW (t) ^ TREE_INT_CST_HIGH (t); - break; - - case RANGE_EXPR: - h = iterative_hash_expr (TREE_OPERAND (t, 0), 0); - h = iterative_hash_expr (TREE_OPERAND (t, 1), h); - break; - - case FIELD_DECL: - /* We can have types that are compatible, but have different member - lists, so we can't hash fields by ID. Use offsets instead. */ - h = iterative_hash_expr (DECL_FIELD_OFFSET (t), 0); - h = iterative_hash_expr (DECL_FIELD_BIT_OFFSET (t), h); - break; - - case BIT_FIELD_REF: - /* Don't take operand 0 into account, that's our parent. */ - h = iterative_hash_expr (TREE_OPERAND (t, 1), 0); - h = iterative_hash_expr (TREE_OPERAND (t, 2), h); - break; - - default: - gcc_unreachable (); - } - - return h; -} - -/* Hash function for type SRA_PAIR. */ - -static hashval_t -sra_elt_hash (const void *x) -{ - const struct sra_elt *const e = (const struct sra_elt *) x; - const struct sra_elt *p; - hashval_t h; - - h = sra_hash_tree (e->element); - - /* Take into account everything except bitfield blocks back up the - chain. Given that chain lengths are rarely very long, this - should be acceptable. If we truly identify this as a performance - problem, it should work to hash the pointer value - "e->parent". */ - for (p = e->parent; p ; p = p->parent) - if (!p->in_bitfld_block) - h = (h * 65521) ^ sra_hash_tree (p->element); - - return h; -} - -/* Equality function for type SRA_PAIR. */ - -static int -sra_elt_eq (const void *x, const void *y) -{ - const struct sra_elt *const a = (const struct sra_elt *) x; - const struct sra_elt *const b = (const struct sra_elt *) y; - tree ae, be; - const struct sra_elt *ap = a->parent; - const struct sra_elt *bp = b->parent; - - if (ap) - while (ap->in_bitfld_block) - ap = ap->parent; - if (bp) - while (bp->in_bitfld_block) - bp = bp->parent; - - if (ap != bp) - return false; - - ae = a->element; - be = b->element; - - if (ae == be) - return true; - if (TREE_CODE (ae) != TREE_CODE (be)) - return false; - - switch (TREE_CODE (ae)) - { - case VAR_DECL: - case PARM_DECL: - case RESULT_DECL: - /* These are all pointer unique. */ - return false; - - case INTEGER_CST: - /* Integers are not pointer unique, so compare their values. */ - return tree_int_cst_equal (ae, be); - - case RANGE_EXPR: - return - tree_int_cst_equal (TREE_OPERAND (ae, 0), TREE_OPERAND (be, 0)) - && tree_int_cst_equal (TREE_OPERAND (ae, 1), TREE_OPERAND (be, 1)); - - case FIELD_DECL: - /* Fields are unique within a record, but not between - compatible records. */ - if (DECL_FIELD_CONTEXT (ae) == DECL_FIELD_CONTEXT (be)) - return false; - return fields_compatible_p (ae, be); - - case BIT_FIELD_REF: - return - tree_int_cst_equal (TREE_OPERAND (ae, 1), TREE_OPERAND (be, 1)) - && tree_int_cst_equal (TREE_OPERAND (ae, 2), TREE_OPERAND (be, 2)); - - default: - gcc_unreachable (); - } -} - -/* Create or return the SRA_ELT structure for CHILD in PARENT. PARENT - may be null, in which case CHILD must be a DECL. */ - -static struct sra_elt * -lookup_element (struct sra_elt *parent, tree child, tree type, - enum insert_option insert) -{ - struct sra_elt dummy; - struct sra_elt **slot; - struct sra_elt *elt; - - if (parent) - dummy.parent = parent->is_group ? parent->parent : parent; - else - dummy.parent = NULL; - dummy.element = child; - - slot = (struct sra_elt **) htab_find_slot (sra_map, &dummy, insert); - if (!slot && insert == NO_INSERT) - return NULL; - - elt = *slot; - if (!elt && insert == INSERT) - { - *slot = elt = XOBNEW (&sra_obstack, struct sra_elt); - memset (elt, 0, sizeof (*elt)); - - elt->parent = parent; - elt->element = child; - elt->type = type; - elt->is_scalar = is_sra_scalar_type (type); - - if (parent) - { - if (IS_ELEMENT_FOR_GROUP (elt->element)) - { - elt->is_group = true; - elt->sibling = parent->groups; - parent->groups = elt; - } - else - { - elt->sibling = parent->children; - parent->children = elt; - } - } - - /* If this is a parameter, then if we want to scalarize, we have - one copy from the true function parameter. Count it now. */ - if (TREE_CODE (child) == PARM_DECL) - { - elt->n_copies = 1; - bitmap_set_bit (needs_copy_in, DECL_UID (child)); - } - } - - return elt; -} - -/* Create or return the SRA_ELT structure for EXPR if the expression - refers to a scalarizable variable. */ - -static struct sra_elt * -maybe_lookup_element_for_expr (tree expr) -{ - struct sra_elt *elt; - tree child; - - switch (TREE_CODE (expr)) - { - case VAR_DECL: - case PARM_DECL: - case RESULT_DECL: - if (is_sra_candidate_decl (expr)) - return lookup_element (NULL, expr, TREE_TYPE (expr), INSERT); - return NULL; - - case ARRAY_REF: - /* We can't scalarize variable array indices. */ - if (in_array_bounds_p (expr)) - child = TREE_OPERAND (expr, 1); - else - return NULL; - break; - - case ARRAY_RANGE_REF: - /* We can't scalarize variable array indices. */ - if (range_in_array_bounds_p (expr)) - { - tree domain = TYPE_DOMAIN (TREE_TYPE (expr)); - child = build2 (RANGE_EXPR, integer_type_node, - TYPE_MIN_VALUE (domain), TYPE_MAX_VALUE (domain)); - } - else - return NULL; - break; - - case COMPONENT_REF: - { - tree type = TREE_TYPE (TREE_OPERAND (expr, 0)); - /* Don't look through unions. */ - if (TREE_CODE (type) != RECORD_TYPE) - return NULL; - /* Neither through variable-sized records. */ - if (TYPE_SIZE (type) == NULL_TREE - || TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST) - return NULL; - child = TREE_OPERAND (expr, 1); - } - break; - - case REALPART_EXPR: - child = integer_zero_node; - break; - case IMAGPART_EXPR: - child = integer_one_node; - break; + /* Values returned by `get_ref_base_and_extent' for each COMPONENT_REF + If EXPR isn't a COMPONENT_REF just set `BASE = EXPR', `OFFSET = 0', + `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ + HOST_WIDE_INT offset; + HOST_WIDE_INT size; + tree base; + + /* Expression. */ + tree expr; + /* Type. */ + tree type; - default: - return NULL; - } + /* Next group representative for this aggregate. */ + struct access *next_grp; - elt = maybe_lookup_element_for_expr (TREE_OPERAND (expr, 0)); - if (elt) - return lookup_element (elt, child, TREE_TYPE (expr), INSERT); - return NULL; -} - -\f -/* Functions to walk just enough of the tree to see all scalarizable - references, and categorize them. */ - -/* A set of callbacks for phases 2 and 4. They'll be invoked for the - various kinds of references seen. In all cases, *GSI is an iterator - pointing to the statement being processed. */ -struct sra_walk_fns -{ - /* Invoked when ELT is required as a unit. Note that ELT might refer to - a leaf node, in which case this is a simple scalar reference. *EXPR_P - points to the location of the expression. IS_OUTPUT is true if this - is a left-hand-side reference. USE_ALL is true if we saw something we - couldn't quite identify and had to force the use of the entire object. */ - void (*use) (struct sra_elt *elt, tree *expr_p, - gimple_stmt_iterator *gsi, bool is_output, bool use_all); - - /* Invoked when we have a copy between two scalarizable references. */ - void (*copy) (struct sra_elt *lhs_elt, struct sra_elt *rhs_elt, - gimple_stmt_iterator *gsi); - - /* Invoked when ELT is initialized from a constant. VALUE may be NULL, - in which case it should be treated as an empty CONSTRUCTOR. */ - void (*init) (struct sra_elt *elt, tree value, gimple_stmt_iterator *gsi); - - /* Invoked when we have a copy between one scalarizable reference ELT - and one non-scalarizable reference OTHER without side-effects. - IS_OUTPUT is true if ELT is on the left-hand side. */ - void (*ldst) (struct sra_elt *elt, tree other, - gimple_stmt_iterator *gsi, bool is_output); - - /* True during phase 2, false during phase 4. */ - /* ??? This is a hack. */ - bool initial_scan; + /* Pointer to the group representative. Pointer to itself if the struct is + the representative. */ + struct access *group_representative; + + /* If this access has any children (in terms of the definition above), this + points to the first one. */ + struct access *first_child; + + /* Pointer to the next sibling in the access tree as described above. */ + struct access *next_sibling; + + /* Pointers to the first and last element in the linked list of assign + links. */ + struct assign_link *first_link, *last_link; + /* Pointer to the next access in the work queue. */ + struct access *next_queued; + + /* Replacement variable for this access "region." Never to be accessed + directly, always only by the means of get_access_replacement() and only + when grp_to_be_replaced flag is set. */ + tree replacement_decl; + + /* Is this particular access write access? */ + unsigned write : 1; + + /* Is this access currently in the work queue? */ + unsigned grp_queued : 1; + /* Does this group contain a write access? This flag is propagated down the + access tree. */ + unsigned grp_write : 1; + /* Does this group contain a read access? This flag is propagated down the + access tree. */ + unsigned grp_read : 1; + /* Is the subtree rooted in this access fully covered by scalar + replacements? */ + unsigned grp_covered : 1; + /* If set to true, this access and all below it in an access tree must not be + scalarized. */ + unsigned grp_unscalarizable_region : 1; + /* Whether data have been written to parts of the aggregate covered by this + access which is not to be scalarized. This flag is propagated up in the + access tree. */ + unsigned grp_unscalarized_data : 1; + /* Does this access and/or group contain a write access through a + BIT_FIELD_REF? */ + unsigned grp_bfr_lhs : 1; + + /* Set when a scalar replacement should be created for this variable. We do + the decision and creation at different places because create_tmp_var + cannot be called from within FOR_EACH_REFERENCED_VAR. */ + unsigned grp_to_be_replaced : 1; }; -#ifdef ENABLE_CHECKING -/* Invoked via walk_tree, if *TP contains a candidate decl, return it. */ - -static tree -sra_find_candidate_decl (tree *tp, int *walk_subtrees, - void *data ATTRIBUTE_UNUSED) -{ - tree t = *tp; - enum tree_code code = TREE_CODE (t); - - if (code == VAR_DECL || code == PARM_DECL || code == RESULT_DECL) - { - *walk_subtrees = 0; - if (is_sra_candidate_decl (t)) - return t; - } - else if (TYPE_P (t)) - *walk_subtrees = 0; - - return NULL; -} -#endif - -/* Walk most expressions looking for a scalarizable aggregate. - If we find one, invoke FNS->USE. */ - -static void -sra_walk_expr (tree *expr_p, gimple_stmt_iterator *gsi, bool is_output, - const struct sra_walk_fns *fns) -{ - tree expr = *expr_p; - tree inner = expr; - bool disable_scalarization = false; - bool use_all_p = false; - - /* We're looking to collect a reference expression between EXPR and INNER, - such that INNER is a scalarizable decl and all other nodes through EXPR - are references that we can scalarize. If we come across something that - we can't scalarize, we reset EXPR. This has the effect of making it - appear that we're referring to the larger expression as a whole. */ - - while (1) - switch (TREE_CODE (inner)) - { - case VAR_DECL: - case PARM_DECL: - case RESULT_DECL: - /* If there is a scalarizable decl at the bottom, then process it. */ - if (is_sra_candidate_decl (inner)) - { - struct sra_elt *elt = maybe_lookup_element_for_expr (expr); - if (disable_scalarization) - elt->cannot_scalarize = true; - else - fns->use (elt, expr_p, gsi, is_output, use_all_p); - } - return; - - case ARRAY_REF: - /* Non-constant index means any member may be accessed. Prevent the - expression from being scalarized. If we were to treat this as a - reference to the whole array, we can wind up with a single dynamic - index reference inside a loop being overridden by several constant - index references during loop setup. It's possible that this could - be avoided by using dynamic usage counts based on BB trip counts - (based on loop analysis or profiling), but that hardly seems worth - the effort. */ - /* ??? Hack. Figure out how to push this into the scan routines - without duplicating too much code. */ - if (!in_array_bounds_p (inner)) - { - disable_scalarization = true; - goto use_all; - } - /* ??? Are we assured that non-constant bounds and stride will have - the same value everywhere? I don't think Fortran will... */ - if (TREE_OPERAND (inner, 2) || TREE_OPERAND (inner, 3)) - goto use_all; - inner = TREE_OPERAND (inner, 0); - break; - - case ARRAY_RANGE_REF: - if (!range_in_array_bounds_p (inner)) - { - disable_scalarization = true; - goto use_all; - } - /* ??? See above non-constant bounds and stride . */ - if (TREE_OPERAND (inner, 2) || TREE_OPERAND (inner, 3)) - goto use_all; - inner = TREE_OPERAND (inner, 0); - break; - - case COMPONENT_REF: - { - tree type = TREE_TYPE (TREE_OPERAND (inner, 0)); - /* Don't look through unions. */ - if (TREE_CODE (type) != RECORD_TYPE) - goto use_all; - /* Neither through variable-sized records. */ - if (TYPE_SIZE (type) == NULL_TREE - || TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST) - goto use_all; - inner = TREE_OPERAND (inner, 0); - } - break; - - case REALPART_EXPR: - case IMAGPART_EXPR: - inner = TREE_OPERAND (inner, 0); - break; - - case BIT_FIELD_REF: - /* A bit field reference to a specific vector is scalarized but for - ones for inputs need to be marked as used on the left hand size so - when we scalarize it, we can mark that variable as non renamable. */ - if (is_output - && TREE_CODE (TREE_TYPE (TREE_OPERAND (inner, 0))) == VECTOR_TYPE) - { - struct sra_elt *elt - = maybe_lookup_element_for_expr (TREE_OPERAND (inner, 0)); - if (elt) - elt->is_vector_lhs = true; - } +typedef struct access *access_p; - /* A bit field reference (access to *multiple* fields simultaneously) - is not currently scalarized. Consider this an access to the full - outer element, to which walk_tree will bring us next. */ - goto use_all; - - CASE_CONVERT: - /* Similarly, a nop explicitly wants to look at an object in a - type other than the one we've scalarized. */ - goto use_all; - - case VIEW_CONVERT_EXPR: - /* Likewise for a view conversion, but with an additional twist: - it can be on the LHS and, in this case, an access to the full - outer element would mean a killing def. So we need to punt - if we haven't already a full access to the current element, - because we cannot pretend to have a killing def if we only - have a partial access at some level. */ - if (is_output && !use_all_p && inner != expr) - disable_scalarization = true; - goto use_all; - - case WITH_SIZE_EXPR: - /* This is a transparent wrapper. The entire inner expression really - is being used. */ - goto use_all; - - use_all: - expr_p = &TREE_OPERAND (inner, 0); - inner = expr = *expr_p; - use_all_p = true; - break; - - default: -#ifdef ENABLE_CHECKING - /* Validate that we're not missing any references. */ - gcc_assert (!walk_tree (&inner, sra_find_candidate_decl, NULL, NULL)); -#endif - return; - } -} - -/* Walk the arguments of a GIMPLE_CALL looking for scalarizable aggregates. - If we find one, invoke FNS->USE. */ - -static void -sra_walk_gimple_call (gimple stmt, gimple_stmt_iterator *gsi, - const struct sra_walk_fns *fns) -{ - int i; - int nargs = gimple_call_num_args (stmt); +DEF_VEC_P (access_p); +DEF_VEC_ALLOC_P (access_p, heap); - for (i = 0; i < nargs; i++) - sra_walk_expr (gimple_call_arg_ptr (stmt, i), gsi, false, fns); - - if (gimple_call_lhs (stmt)) - sra_walk_expr (gimple_call_lhs_ptr (stmt), gsi, true, fns); -} - -/* Walk the inputs and outputs of a GIMPLE_ASM looking for scalarizable - aggregates. If we find one, invoke FNS->USE. */ - -static void -sra_walk_gimple_asm (gimple stmt, gimple_stmt_iterator *gsi, - const struct sra_walk_fns *fns) -{ - size_t i; - for (i = 0; i < gimple_asm_ninputs (stmt); i++) - sra_walk_expr (&TREE_VALUE (gimple_asm_input_op (stmt, i)), gsi, false, fns); - for (i = 0; i < gimple_asm_noutputs (stmt); i++) - sra_walk_expr (&TREE_VALUE (gimple_asm_output_op (stmt, i)), gsi, true, fns); -} - -/* Walk a GIMPLE_ASSIGN and categorize the assignment appropriately. */ - -static void -sra_walk_gimple_assign (gimple stmt, gimple_stmt_iterator *gsi, - const struct sra_walk_fns *fns) -{ - struct sra_elt *lhs_elt = NULL, *rhs_elt = NULL; - tree lhs, rhs; +/* Alloc pool for allocating access structures. */ +static alloc_pool access_pool; - /* If there is more than 1 element on the RHS, only walk the lhs. */ - if (!gimple_assign_single_p (stmt)) - { - sra_walk_expr (gimple_assign_lhs_ptr (stmt), gsi, true, fns); - return; - } - - lhs = gimple_assign_lhs (stmt); - rhs = gimple_assign_rhs1 (stmt); - lhs_elt = maybe_lookup_element_for_expr (lhs); - rhs_elt = maybe_lookup_element_for_expr (rhs); - - /* If both sides are scalarizable, this is a COPY operation. */ - if (lhs_elt && rhs_elt) - { - fns->copy (lhs_elt, rhs_elt, gsi); - return; - } - - /* If the RHS is scalarizable, handle it. There are only two cases. */ - if (rhs_elt) - { - if (!rhs_elt->is_scalar && !TREE_SIDE_EFFECTS (lhs)) - fns->ldst (rhs_elt, lhs, gsi, false); - else - fns->use (rhs_elt, gimple_assign_rhs1_ptr (stmt), gsi, false, false); - } - - /* If it isn't scalarizable, there may be scalarizable variables within, so - check for a call or else walk the RHS to see if we need to do any - copy-in operations. We need to do it before the LHS is scalarized so - that the statements get inserted in the proper place, before any - copy-out operations. */ - else - sra_walk_expr (gimple_assign_rhs1_ptr (stmt), gsi, false, fns); - - /* Likewise, handle the LHS being scalarizable. We have cases similar - to those above, but also want to handle RHS being constant. */ - if (lhs_elt) - { - /* If this is an assignment from a constant, or constructor, then - we have access to all of the elements individually. Invoke INIT. */ - if (TREE_CODE (rhs) == COMPLEX_EXPR - || TREE_CODE (rhs) == COMPLEX_CST - || TREE_CODE (rhs) == CONSTRUCTOR) - fns->init (lhs_elt, rhs, gsi); - - /* If this is an assignment from read-only memory, treat this as if - we'd been passed the constructor directly. Invoke INIT. */ - else if (TREE_CODE (rhs) == VAR_DECL - && TREE_STATIC (rhs) - && !DECL_EXTERNAL (rhs) - && TREE_READONLY (rhs) - && targetm.binds_local_p (rhs)) - fns->init (lhs_elt, DECL_INITIAL (rhs), gsi); - - /* If this is a copy from a non-scalarizable lvalue, invoke LDST. - The lvalue requirement prevents us from trying to directly scalarize - the result of a function call. Which would result in trying to call - the function multiple times, and other evil things. */ - else if (!lhs_elt->is_scalar - && !TREE_SIDE_EFFECTS (rhs) && is_gimple_addressable (rhs)) - fns->ldst (lhs_elt, rhs, gsi, true); - - /* Otherwise we're being used in some context that requires the - aggregate to be seen as a whole. Invoke USE. */ - else - fns->use (lhs_elt, gimple_assign_lhs_ptr (stmt), gsi, true, false); - } - - /* Similarly to above, LHS_ELT being null only means that the LHS as a - whole is not a scalarizable reference. There may be occurrences of - scalarizable variables within, which implies a USE. */ - else - sra_walk_expr (gimple_assign_lhs_ptr (stmt), gsi, true, fns); -} - -/* Entry point to the walk functions. Search the entire function, - invoking the callbacks in FNS on each of the references to - scalarizable variables. */ - -static void -sra_walk_function (const struct sra_walk_fns *fns) -{ - basic_block bb; - gimple_stmt_iterator si, ni; - - /* ??? Phase 4 could derive some benefit to walking the function in - dominator tree order. */ - - FOR_EACH_BB (bb) - for (si = gsi_start_bb (bb); !gsi_end_p (si); si = ni) - { - gimple stmt; - - stmt = gsi_stmt (si); - - ni = si; - gsi_next (&ni); - - /* If the statement does not reference memory, then it doesn't - make any structure references that we care about. */ - if (!gimple_references_memory_p (stmt)) - continue; - - switch (gimple_code (stmt)) - { - case GIMPLE_RETURN: - /* If we have "return <retval>" then the return value is - already exposed for our pleasure. Walk it as a USE to - force all the components back in place for the return. - */ - if (gimple_return_retval (stmt) == NULL_TREE) - ; - else - sra_walk_expr (gimple_return_retval_ptr (stmt), &si, false, - fns); - break; - - case GIMPLE_ASSIGN: - sra_walk_gimple_assign (stmt, &si, fns); - break; - case GIMPLE_CALL: - sra_walk_gimple_call (stmt, &si, fns); - break; - case GIMPLE_ASM: - sra_walk_gimple_asm (stmt, &si, fns); - break; - - default: - break; - } - } -} -\f -/* Phase One: Scan all referenced variables in the program looking for - structures that could be decomposed. */ - -static bool -find_candidates_for_sra (void) +/* A structure linking lhs and rhs accesses from an aggregate assignment. They + are used to propagate subaccesses from rhs to lhs as long as they don't + conflict with what is already there. */ +struct assign_link { - bool any_set = false; - tree var; - referenced_var_iterator rvi; + struct access *lacc, *racc; + struct assign_link *next; +}; - FOR_EACH_REFERENCED_VAR (var, rvi) - { - if (decl_can_be_decomposed_p (var)) - { - bitmap_set_bit (sra_candidates, DECL_UID (var)); - any_set = true; - } - } +/* Alloc pool for allocating assign link structures. */ +static alloc_pool link_pool; - return any_set; -} +/* Base (tree) -> Vector (VEC(access_p,heap) *) map. */ +static struct pointer_map_t *base_access_vec; + +/* Bitmap of bases (candidates). */ +static bitmap candidate_bitmap; +/* Bitmap of declarations used in a return statement. */ +static bitmap retvals_bitmap; +/* Obstack for creation of fancy names. */ +static struct obstack name_obstack; -\f -/* Phase Two: Scan all references to scalarizable variables. Count the - number of times they are used or copied respectively. */ +/* Head of a linked list of accesses that need to have its subaccesses + propagated to their assignment counterparts. */ +static struct access *work_queue_head; -/* Callbacks to fill in SRA_WALK_FNS. Everything but USE is - considered a copy, because we can decompose the reference such that - the sub-elements needn't be contiguous. */ +/* Dump contents of ACCESS to file F in a human friendly way. If GRP is true, + representative fields are dumped, otherwise those which only describe the + individual access are. */ static void -scan_use (struct sra_elt *elt, tree *expr_p ATTRIBUTE_UNUSED, - gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, - bool is_output ATTRIBUTE_UNUSED, bool use_all ATTRIBUTE_UNUSED) +dump_access (FILE *f, struct access *access, bool grp) { - elt->n_uses += 1; + fprintf (f, "access { "); + fprintf (f, "base = (%d)'", DECL_UID (access->base)); + print_generic_expr (f, access->base, 0); + fprintf (f, "', offset = %d", (int) access->offset); + fprintf (f, ", size = %d", (int) access->size); + fprintf (f, ", expr = "); + print_generic_expr (f, access->expr, 0); + fprintf (f, ", type = "); + print_generic_expr (f, access->type, 0); + if (grp) + fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, " + "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " + "grp_to_be_replaced = %d\n", + access->grp_write, access->grp_read, access->grp_covered, + access->grp_unscalarizable_region, access->grp_unscalarized_data, + access->grp_to_be_replaced); + else + fprintf (f, ", write = %d'\n", access->write); } +/* Dump a subtree rooted in ACCESS to file F, indent by LEVEL. */ + static void -scan_copy (struct sra_elt *lhs_elt, struct sra_elt *rhs_elt, - gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED) +dump_access_tree_1 (FILE *f, struct access *access, int level) { - lhs_elt->n_copies += 1; - rhs_elt->n_copies += 1; + do + { + int i; + + for (i = 0; i < level; i++) + fputs ("* ", dump_file); + + dump_access (f, access, true); + + if (access->first_child) + dump_access_tree_1 (f, access->first_child, level + 1); + + access = access->next_sibling; + } + while (access); } +/* Dump all access trees for a variable, given the pointer to the first root in + ACCESS. */ + static void -scan_init (struct sra_elt *lhs_elt, tree rhs ATTRIBUTE_UNUSED, - gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED) +dump_access_tree (FILE *f, struct access *access) { - lhs_elt->n_copies += 1; + for (; access; access = access->next_grp) + dump_access_tree_1 (f, access, 0); } -static void -scan_ldst (struct sra_elt *elt, tree other ATTRIBUTE_UNUSED, - gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, - bool is_output ATTRIBUTE_UNUSED) +/* Return a vector of pointers to accesses for the variable given in BASE or + NULL if there is none. */ + +static VEC (access_p, heap) * +get_base_access_vector (tree base) { - elt->n_copies += 1; + void **slot; + + slot = pointer_map_contains (base_access_vec, base); + if (!slot) + return NULL; + else + return *(VEC (access_p, heap) **) slot; } -/* Dump the values we collected during the scanning phase. */ +/* Find an access with required OFFSET and SIZE in a subtree of accesses rooted + in ACCESS. Return NULL if it cannot be found. */ -static void -scan_dump (struct sra_elt *elt) +static struct access * +find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, + HOST_WIDE_INT size) { - struct sra_elt *c; - - dump_sra_elt_name (dump_file, elt); - fprintf (dump_file, ": n_uses=%u n_copies=%u\n", elt->n_uses, elt->n_copies); + while (access && (access->offset != offset || access->size != size)) + { + struct access *child = access->first_child; - for (c = elt->children; c ; c = c->sibling) - scan_dump (c); + while (child && (child->offset + child->size <= offset)) + child = child->next_sibling; + access = child; + } - for (c = elt->groups; c ; c = c->sibling) - scan_dump (c); + return access; } -/* Entry point to phase 2. Scan the entire function, building up - scalarization data structures, recording copies and uses. */ +/* Return the first group representative for DECL or NULL if none exists. */ -static void -scan_function (void) +static struct access * +get_first_repr_for_decl (tree base) { - static const struct sra_walk_fns fns = { - scan_use, scan_copy, scan_init, scan_ldst, true - }; - bitmap_iterator bi; - - sra_walk_function (&fns); + VEC (access_p, heap) *access_vec; - if (dump_file && (dump_flags & TDF_DETAILS)) - { - unsigned i; + access_vec = get_base_access_vector (base); + if (!access_vec) + return NULL; - fputs ("\nScan results:\n", dump_file); - EXECUTE_IF_SET_IN_BITMAP (sra_candidates, 0, i, bi) - { - tree var = referenced_var (i); - struct sra_elt *elt = lookup_element (NULL, var, NULL, NO_INSERT); - if (elt) - scan_dump (elt); - } - fputc ('\n', dump_file); - } + return VEC_index (access_p, access_vec, 0); } -\f -/* Phase Three: Make decisions about which variables to scalarize, if any. - All elements to be scalarized have replacement variables made for them. */ -/* A subroutine of build_element_name. Recursively build the element - name on the obstack. */ +/* Find an access representative for the variable BASE and given OFFSET and + SIZE. Requires that access trees have already been built. Return NULL if + it cannot be found. */ + +static struct access * +get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, + HOST_WIDE_INT size) +{ + struct access *access; + + access = get_first_repr_for_decl (base); + while (access && (access->offset + access->size <= offset)) + access = access->next_grp; + if (!access) + return NULL; + + return find_access_in_subtree (access, offset, size); +} +/* Add LINK to the linked list of assign links of RACC. */ static void -build_element_name_1 (struct sra_elt *elt) +add_link_to_rhs (struct access *racc, struct assign_link *link) { - tree t; - char buffer[32]; + gcc_assert (link->racc == racc); - if (elt->parent) + if (!racc->first_link) { - build_element_name_1 (elt->parent); - obstack_1grow (&sra_obstack, '$'); + gcc_assert (!racc->last_link); + racc->first_link = link; + } + else + racc->last_link->next = link; - if (TREE_CODE (elt->parent->type) == COMPLEX_TYPE) - { - if (elt->element == integer_zero_node) - obstack_grow (&sra_obstack, "real", 4); - else - obstack_grow (&sra_obstack, "imag", 4); - return; - } + racc->last_link = link; + link->next = NULL; +} + +/* Move all link structures in their linked list in OLD_RACC to the linked list + in NEW_RACC. */ +static void +relink_to_new_repr (struct access *new_racc, struct access *old_racc) +{ + if (!old_racc->first_link) + { + gcc_assert (!old_racc->last_link); + return; } - t = elt->element; - if (TREE_CODE (t) == INTEGER_CST) + if (new_racc->first_link) { - /* ??? Eh. Don't bother doing double-wide printing. */ - sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (t)); - obstack_grow (&sra_obstack, buffer, strlen (buffer)); - } - else if (TREE_CODE (t) == BIT_FIELD_REF) - { - sprintf (buffer, "B" HOST_WIDE_INT_PRINT_DEC, - tree_low_cst (TREE_OPERAND (t, 2), 1)); - obstack_grow (&sra_obstack, buffer, strlen (buffer)); - sprintf (buffer, "F" HOST_WIDE_INT_PRINT_DEC, - tree_low_cst (TREE_OPERAND (t, 1), 1)); - obstack_grow (&sra_obstack, buffer, strlen (buffer)); + gcc_assert (!new_racc->last_link->next); + gcc_assert (!old_racc->last_link || !old_racc->last_link->next); + + new_racc->last_link->next = old_racc->first_link; + new_racc->last_link = old_racc->last_link; } else { - tree name = DECL_NAME (t); - if (name) - obstack_grow (&sra_obstack, IDENTIFIER_POINTER (name), - IDENTIFIER_LENGTH (name)); - else - { - sprintf (buffer, "D%u", DECL_UID (t)); - obstack_grow (&sra_obstack, buffer, strlen (buffer)); - } + gcc_assert (!new_racc->last_link); + + new_racc->first_link = old_racc->first_link; + new_racc->last_link = old_racc->last_link; } + old_racc->first_link = old_racc->last_link = NULL; } -/* Construct a pretty variable name for an element's replacement variable. - The name is built on the obstack. */ +/* Add ACCESS to the work queue (which is actually a stack). */ -static char * -build_element_name (struct sra_elt *elt) +static void +add_access_to_work_queue (struct access *access) +{ + if (!access->grp_queued) + { + gcc_assert (!access->next_queued); + access->next_queued = work_queue_head; + access->grp_queued = 1; + work_queue_head = access; + } +} + +/* Pop an access from the work queue, and return it, assuming there is one. */ + +static struct access * +pop_access_from_work_queue (void) { - build_element_name_1 (elt); - obstack_1grow (&sra_obstack, '\0'); - return XOBFINISH (&sra_obstack, char *); + struct access *access = work_queue_head; + + work_queue_head = access->next_queued; + access->next_queued = NULL; + access->grp_queued = 0; + return access; } -/* Insert a gimple_seq SEQ on all the outgoing edges out of BB. Note that - if BB has more than one edge, STMT will be replicated for each edge. - Also, abnormal edges will be ignored. */ + +/* Allocate necessary structures. */ static void -insert_edge_copies_seq (gimple_seq seq, basic_block bb) +sra_initialize (void) { - edge e; - edge_iterator ei; - unsigned n_copies = -1; + candidate_bitmap = BITMAP_ALLOC (NULL); + retvals_bitmap = BITMAP_ALLOC (NULL); + gcc_obstack_init (&name_obstack); + access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16); + link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16); + base_access_vec = pointer_map_create (); +} - FOR_EACH_EDGE (e, ei, bb->succs) - if (!(e->flags & EDGE_ABNORMAL)) - n_copies++; +/* Hook fed to pointer_map_traverse, deallocate stored vectors. */ - FOR_EACH_EDGE (e, ei, bb->succs) - if (!(e->flags & EDGE_ABNORMAL)) - gsi_insert_seq_on_edge (e, n_copies-- > 0 ? gimple_seq_copy (seq) : seq); +static bool +delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value, + void *data ATTRIBUTE_UNUSED) +{ + VEC (access_p, heap) *access_vec; + access_vec = (VEC (access_p, heap) *) *value; + VEC_free (access_p, heap, access_vec); + + return true; } -/* Instantiate an element as an independent variable. */ +/* Deallocate all general structures. */ static void -instantiate_element (struct sra_elt *elt) +sra_deinitialize (void) { - struct sra_elt *base_elt; - tree var, base; - bool nowarn = TREE_NO_WARNING (elt->element); + BITMAP_FREE (candidate_bitmap); + BITMAP_FREE (retvals_bitmap); + free_alloc_pool (access_pool); + free_alloc_pool (link_pool); + obstack_free (&name_obstack, NULL); - for (base_elt = elt; base_elt->parent; base_elt = base_elt->parent) - if (!nowarn) - nowarn = TREE_NO_WARNING (base_elt->parent->element); - base = base_elt->element; + pointer_map_traverse (base_access_vec, delete_base_accesses, NULL); + pointer_map_destroy (base_access_vec); +} - elt->replacement = var = make_rename_temp (elt->type, "SR"); +/* Remove DECL from candidates for SRA and write REASON to the dump file if + there is one. */ +static void +disqualify_candidate (tree decl, const char *reason) +{ + bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)); - if (DECL_P (elt->element) - && !tree_int_cst_equal (DECL_SIZE (var), DECL_SIZE (elt->element))) + if (dump_file) { - DECL_SIZE (var) = DECL_SIZE (elt->element); - DECL_SIZE_UNIT (var) = DECL_SIZE_UNIT (elt->element); - - elt->in_bitfld_block = 1; - elt->replacement = fold_build3 (BIT_FIELD_REF, elt->type, var, - DECL_SIZE (var), - BYTES_BIG_ENDIAN - ? size_binop (MINUS_EXPR, - TYPE_SIZE (elt->type), - DECL_SIZE (var)) - : bitsize_int (0)); + fprintf (dump_file, "! Disqualifying "); + print_generic_expr (dump_file, decl, 0); + fprintf (dump_file, " - %s\n", reason); } +} - /* For vectors, if used on the left hand side with BIT_FIELD_REF, - they are not a gimple register. */ - if (TREE_CODE (TREE_TYPE (var)) == VECTOR_TYPE && elt->is_vector_lhs) - DECL_GIMPLE_REG_P (var) = 0; +/* Return true iff the type contains a field or an element which does not allow + scalarization. */ - DECL_SOURCE_LOCATION (var) = DECL_SOURCE_LOCATION (base); - DECL_ARTIFICIAL (var) = 1; +static bool +type_internals_preclude_sra_p (tree type) +{ + tree fld; + tree et; - if (TREE_THIS_VOLATILE (elt->type)) + switch (TREE_CODE (type)) { - TREE_THIS_VOLATILE (var) = 1; - TREE_SIDE_EFFECTS (var) = 1; + case RECORD_TYPE: + case UNION_TYPE: + case QUAL_UNION_TYPE: + for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) + if (TREE_CODE (fld) == FIELD_DECL) + { + tree ft = TREE_TYPE (fld); + + if (TREE_THIS_VOLATILE (fld) + || !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld) + || !host_integerp (DECL_FIELD_OFFSET (fld), 1) + || !host_integerp (DECL_SIZE (fld), 1)) + return true; + + if (AGGREGATE_TYPE_P (ft) + && type_internals_preclude_sra_p (ft)) + return true; + } + + return false; + + case ARRAY_TYPE: + et = TREE_TYPE (type); + + if (AGGREGATE_TYPE_P (et)) + return type_internals_preclude_sra_p (et); + else + return false; + + default: + return false; } +} - if (DECL_NAME (base) && !DECL_IGNORED_P (base)) +/* Create and insert access for EXPR. Return created access, or NULL if it is + not possible. */ + +static struct access * +create_access (tree expr, bool write) +{ + struct access *access; + void **slot; + VEC (access_p,heap) *vec; + HOST_WIDE_INT offset, size, max_size; + tree base = expr; + bool unscalarizable_region = false; + + if (handled_component_p (expr)) + base = get_ref_base_and_extent (expr, &offset, &size, &max_size); + else { - char *pretty_name = build_element_name (elt); - DECL_NAME (var) = get_identifier (pretty_name); - obstack_free (&sra_obstack, pretty_name); + tree tree_size; - SET_DECL_DEBUG_EXPR (var, generate_element_ref (elt)); - DECL_DEBUG_EXPR_IS_FROM (var) = 1; - - DECL_IGNORED_P (var) = 0; - TREE_NO_WARNING (var) = nowarn; + tree_size = TYPE_SIZE (TREE_TYPE (base)); + if (tree_size && host_integerp (tree_size, 1)) + size = max_size = tree_low_cst (tree_size, 1); + else + size = max_size = -1; + + offset = 0; } - else + + if (!base || !DECL_P (base) + || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) + return NULL; + + if (size != max_size) { - DECL_IGNORED_P (var) = 1; - /* ??? We can't generate any warning that would be meaningful. */ - TREE_NO_WARNING (var) = 1; - } - - /* Zero-initialize bit-field scalarization variables, to avoid - triggering undefined behavior. */ - if (TREE_CODE (elt->element) == BIT_FIELD_REF - || (var != elt->replacement - && TREE_CODE (elt->replacement) == BIT_FIELD_REF)) - { - gimple_seq init = sra_build_assignment (var, - fold_convert (TREE_TYPE (var), - integer_zero_node) - ); - insert_edge_copies_seq (init, ENTRY_BLOCK_PTR); - mark_all_v_defs_seq (init); + size = max_size; + unscalarizable_region = true; } - if (dump_file) + if (size < 0) { - fputs (" ", dump_file); - dump_sra_elt_name (dump_file, elt); - fputs (" -> ", dump_file); - print_generic_expr (dump_file, var, dump_flags); - fputc ('\n', dump_file); + disqualify_candidate (base, "Encountered an ultra variable sized " + "access."); + return NULL; } + + access = (struct access *) pool_alloc (access_pool); + memset (access, 0, sizeof (struct access)); + + access->base = base; + access->offset = offset; + access->size = size; + access->expr = expr; + access->type = TREE_TYPE (expr); + access->write = write; + access->grp_unscalarizable_region = unscalarizable_region; + + slot = pointer_map_contains (base_access_vec, base); + if (slot) + vec = (VEC (access_p, heap) *) *slot; + else + vec = VEC_alloc (access_p, heap, 32); + + VEC_safe_push (access_p, heap, vec, access); + + *((struct VEC (access_p,heap) **) + pointer_map_insert (base_access_vec, base)) = vec; + + return access; } -/* Make one pass across an element tree deciding whether or not it's - profitable to instantiate individual leaf scalars. - PARENT_USES and PARENT_COPIES are the sum of the N_USES and N_COPIES - fields all the way up the tree. */ +/* Callback of walk_tree. Search the given tree for a declaration and exclude + it from the candidates. */ -static void -decide_instantiation_1 (struct sra_elt *elt, unsigned int parent_uses, - unsigned int parent_copies) +static tree +disqualify_all (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) { - if (dump_file && !elt->parent) - { - fputs ("Initial instantiation for ", dump_file); - dump_sra_elt_name (dump_file, elt); - fputc ('\n', dump_file); - } + tree base = *tp; + - if (elt->cannot_scalarize) - return; + if (TREE_CODE (base) == SSA_NAME) + base = SSA_NAME_VAR (base); - if (elt->is_scalar) + if (DECL_P (base)) { - /* The decision is simple: instantiate if we're used more frequently - than the parent needs to be seen as a complete unit. */ - if (elt->n_uses + elt->n_copies + parent_copies > parent_uses) - instantiate_element (elt); + disqualify_candidate (base, "From within disqualify_all()."); + *walk_subtrees = 0; } else - { - struct sra_elt *c, *group; - unsigned int this_uses = elt->n_uses + parent_uses; - unsigned int this_copies = elt->n_copies + parent_copies; - - /* Consider groups of sub-elements as weighing in favour of - instantiation whatever their size. */ - for (group = elt->groups; group ; group = group->sibling) - FOR_EACH_ACTUAL_CHILD (c, group) - { - c->n_uses += group->n_uses; - c->n_copies += group->n_copies; - } + *walk_subtrees = 1; - for (c = elt->children; c ; c = c->sibling) - decide_instantiation_1 (c, this_uses, this_copies); - } + + return NULL_TREE; } -/* Compute the size and number of all instantiated elements below ELT. - We will only care about this if the size of the complete structure - fits in a HOST_WIDE_INT, so we don't have to worry about overflow. */ +/* Scan expression EXPR and create access structures for all accesses to + candidates for scalarization. Return the created access or NULL if none is + created. */ -static unsigned int -sum_instantiated_sizes (struct sra_elt *elt, unsigned HOST_WIDE_INT *sizep) +static struct access * +build_access_from_expr_1 (tree *expr_ptr, + gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write) { - if (elt->replacement) + struct access *ret = NULL; + tree expr = *expr_ptr; + tree safe_expr = expr; + bool bit_ref; + + if (TREE_CODE (expr) == BIT_FIELD_REF) { - *sizep += TREE_INT_CST_LOW (TYPE_SIZE_UNIT (elt->type)); - return 1; + expr = TREE_OPERAND (expr, 0); + bit_ref = true; } else + bit_ref = false; + + while (TREE_CODE (expr) == NOP_EXPR + || TREE_CODE (expr) == VIEW_CONVERT_EXPR + || TREE_CODE (expr) == REALPART_EXPR + || TREE_CODE (expr) == IMAGPART_EXPR) + expr = TREE_OPERAND (expr, 0); + + switch (TREE_CODE (expr)) { - struct sra_elt *c; - unsigned int count = 0; + case ADDR_EXPR: + case SSA_NAME: + case INDIRECT_REF: + break; + + case VAR_DECL: + case PARM_DECL: + case RESULT_DECL: + case COMPONENT_REF: + case ARRAY_REF: + ret = create_access (expr, write); + break; - for (c = elt->children; c ; c = c->sibling) - count += sum_instantiated_sizes (c, sizep); + case REALPART_EXPR: + case IMAGPART_EXPR: + expr = TREE_OPERAND (expr, 0); + ret = create_access (expr, write); + break; - return count; + case ARRAY_RANGE_REF: + default: + walk_tree (&safe_expr, disqualify_all, NULL, NULL); + break; } + + if (write && bit_ref && ret) + ret->grp_bfr_lhs = 1; + + return ret; } -/* Instantiate fields in ELT->TYPE that are not currently present as - children of ELT. */ +/* Scan expression EXPR and create access structures for all accesses to + candidates for scalarization. Return true if any access has been + inserted. */ -static void instantiate_missing_elements (struct sra_elt *elt); +static bool +build_access_from_expr (tree *expr_ptr, + gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write, + void *data ATTRIBUTE_UNUSED) +{ + return build_access_from_expr_1 (expr_ptr, gsi, write) != NULL; +} -static struct sra_elt * -instantiate_missing_elements_1 (struct sra_elt *elt, tree child, tree type) +/* Disqualify LHS and RHS for scalarization if STMT must end its basic block in + modes in which it matters, return true iff they have been disqualified. RHS + may be NULL, in that case ignore it. If we scalarize an aggregate in + intra-SRA we may need to add statements after each statement. This is not + possible if a statement unconditionally has to end the basic block. */ +static bool +disqualify_ops_if_throwing_stmt (gimple stmt, tree *lhs, tree *rhs) { - struct sra_elt *sub = lookup_element (elt, child, type, INSERT); - if (sub->is_scalar) + if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)) { - if (sub->replacement == NULL) - instantiate_element (sub); + walk_tree (lhs, disqualify_all, NULL, NULL); + if (rhs) + walk_tree (rhs, disqualify_all, NULL, NULL); + return true; } - else - instantiate_missing_elements (sub); - return sub; + return false; } -/* Obtain the canonical type for field F of ELEMENT. */ -static tree -canon_type_for_field (tree f, tree element) -{ - tree field_type = TREE_TYPE (f); +/* Result code for scan_assign callback for scan_function. */ +enum scan_assign_result {SRA_SA_NONE, /* nothing done for the stmt */ + SRA_SA_PROCESSED, /* stmt analyzed/changed */ + SRA_SA_REMOVED}; /* stmt redundant and eliminated */ - /* canonicalize_component_ref() unwidens some bit-field types (not - marked as DECL_BIT_FIELD in C++), so we must do the same, lest we - may introduce type mismatches. */ - if (INTEGRAL_TYPE_P (field_type) - && DECL_MODE (f) != TYPE_MODE (field_type)) - field_type = TREE_TYPE (get_unwidened (build3 (COMPONENT_REF, - field_type, - element, - f, NULL_TREE), - NULL_TREE)); - - return field_type; -} - -/* Look for adjacent fields of ELT starting at F that we'd like to - scalarize as a single variable. Return the last field of the - group. */ -static tree -try_instantiate_multiple_fields (struct sra_elt *elt, tree f) +/* Scan expressions occuring in the statement pointed to by STMT_EXPR, create + access structures for all accesses to candidates for scalarization and + remove those candidates which occur in statements or expressions that + prevent them from being split apart. Return true if any access has been + inserted. */ + +static enum scan_assign_result +build_accesses_from_assign (gimple *stmt_ptr, + gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) { - int count; - unsigned HOST_WIDE_INT align, bit, size, alchk; - enum machine_mode mode; - tree first = f, prev; - tree type, var; - struct sra_elt *block; - - /* Point fields are typically best handled as standalone entities. */ - if (POINTER_TYPE_P (TREE_TYPE (f))) - return f; - - if (!is_sra_scalar_type (TREE_TYPE (f)) - || !host_integerp (DECL_FIELD_OFFSET (f), 1) - || !host_integerp (DECL_FIELD_BIT_OFFSET (f), 1) - || !host_integerp (DECL_SIZE (f), 1) - || lookup_element (elt, f, NULL, NO_INSERT)) - return f; - - block = elt; - - /* For complex and array objects, there are going to be integer - literals as child elements. In this case, we can't just take the - alignment and mode of the decl, so we instead rely on the element - type. - - ??? We could try to infer additional alignment from the full - object declaration and the location of the sub-elements we're - accessing. */ - for (count = 0; !DECL_P (block->element); count++) - block = block->parent; - - align = DECL_ALIGN (block->element); - alchk = GET_MODE_BITSIZE (DECL_MODE (block->element)); - - if (count) - { - type = TREE_TYPE (block->element); - while (count--) - type = TREE_TYPE (type); - - align = TYPE_ALIGN (type); - alchk = GET_MODE_BITSIZE (TYPE_MODE (type)); - } - - if (align < alchk) - align = alchk; - - /* Coalescing wider fields is probably pointless and - inefficient. */ - if (align > BITS_PER_WORD) - align = BITS_PER_WORD; - - bit = tree_low_cst (DECL_FIELD_OFFSET (f), 1) * BITS_PER_UNIT - + tree_low_cst (DECL_FIELD_BIT_OFFSET (f), 1); - size = tree_low_cst (DECL_SIZE (f), 1); - - alchk = align - 1; - alchk = ~alchk; - - if ((bit & alchk) != ((bit + size - 1) & alchk)) - return f; - - /* Find adjacent fields in the same alignment word. */ - - for (prev = f, f = TREE_CHAIN (f); - f && TREE_CODE (f) == FIELD_DECL - && is_sra_scalar_type (TREE_TYPE (f)) - && host_integerp (DECL_FIELD_OFFSET (f), 1) - && host_integerp (DECL_FIELD_BIT_OFFSET (f), 1) - && host_integerp (DECL_SIZE (f), 1) - && !lookup_element (elt, f, NULL, NO_INSERT); - prev = f, f = TREE_CHAIN (f)) - { - unsigned HOST_WIDE_INT nbit, nsize; - - nbit = tree_low_cst (DECL_FIELD_OFFSET (f), 1) * BITS_PER_UNIT - + tree_low_cst (DECL_FIELD_BIT_OFFSET (f), 1); - nsize = tree_low_cst (DECL_SIZE (f), 1); + gimple stmt = *stmt_ptr; + tree *lhs_ptr, *rhs_ptr; + struct access *lacc, *racc; - if (bit + size == nbit) - { - if ((bit & alchk) != ((nbit + nsize - 1) & alchk)) - { - /* If we're at an alignment boundary, don't bother - growing alignment such that we can include this next - field. */ - if ((nbit & alchk) - || GET_MODE_BITSIZE (DECL_MODE (f)) <= align) - break; - - align = GET_MODE_BITSIZE (DECL_MODE (f)); - alchk = align - 1; - alchk = ~alchk; + if (gimple_assign_rhs2 (stmt)) + return SRA_SA_NONE; - if ((bit & alchk) != ((nbit + nsize - 1) & alchk)) - break; - } - size += nsize; - } - else if (nbit + nsize == bit) - { - if ((nbit & alchk) != ((bit + size - 1) & alchk)) - { - if ((bit & alchk) - || GET_MODE_BITSIZE (DECL_MODE (f)) <= align) - break; - - align = GET_MODE_BITSIZE (DECL_MODE (f)); - alchk = align - 1; - alchk = ~alchk; + lhs_ptr = gimple_assign_lhs_ptr (stmt); + rhs_ptr = gimple_assign_rhs1_ptr (stmt); - if ((nbit & alchk) != ((bit + size - 1) & alchk)) - break; - } - bit = nbit; - size += nsize; - } - else - break; - } + if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr)) + return SRA_SA_NONE; + + racc = build_access_from_expr_1 (rhs_ptr, gsi, false); + lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); + + if (lacc && racc + && !lacc->grp_unscalarizable_region + && !racc->grp_unscalarizable_region + && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) + && lacc->size <= racc->size + && useless_type_conversion_p (lacc->type, racc->type)) + { + struct assign_link *link; - f = prev; + link = (struct assign_link *) pool_alloc (link_pool); + memset (link, 0, sizeof (struct assign_link)); - if (f == first) - return f; + link->lacc = lacc; + link->racc = racc; + + add_link_to_rhs (racc, link); + } + + return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE; +} - gcc_assert ((bit & alchk) == ((bit + size - 1) & alchk)); +/* Scan function and look for interesting statements. Return true if any has + been found or processed, as indicated by callbacks. SCAN_EXPR is a callback + called on all expressions within statements except assign statements and + those deemed entirely unsuitable for some reason (all operands in such + statements and expression are removed from candidate_bitmap). SCAN_ASSIGN + is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback + called on assign statements and those call statements which have a lhs and + it is the only callback which can be NULL. ANALYSIS_STAGE is true when + running in the analysis stage of a pass and thus no statement is being + modified. DATA is a pointer passed to all callbacks. If any single + callback returns true, this function also returns true, otherwise it returns + false. */ - /* Try to widen the bit range so as to cover padding bits as well. */ +static bool +scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *), + enum scan_assign_result (*scan_assign) (gimple *, + gimple_stmt_iterator *, + void *), + bool (*handle_ssa_defs)(gimple, void *), + bool analysis_stage, void *data) +{ + gimple_stmt_iterator gsi; + basic_block bb; + unsigned i; + tree *t; + bool ret = false; - if ((bit & ~alchk) || size != align) + FOR_EACH_BB (bb) { - unsigned HOST_WIDE_INT mbit = bit & alchk; - unsigned HOST_WIDE_INT msize = align; + bool bb_changed = false; - for (f = TYPE_FIELDS (elt->type); - f; f = TREE_CHAIN (f)) + gsi = gsi_start_bb (bb); + while (!gsi_end_p (gsi)) { - unsigned HOST_WIDE_INT fbit, fsize; + gimple stmt = gsi_stmt (gsi); + enum scan_assign_result assign_result; + bool any = false, deleted = false; - /* Skip the fields from first to prev. */ - if (f == first) + switch (gimple_code (stmt)) { - f = prev; - continue; - } - - if (!(TREE_CODE (f) == FIELD_DECL - && host_integerp (DECL_FIELD_OFFSET (f), 1) - && host_integerp (DECL_FIELD_BIT_OFFSET (f), 1))) - continue; - - fbit = tree_low_cst (DECL_FIELD_OFFSET (f), 1) * BITS_PER_UNIT - + tree_low_cst (DECL_FIELD_BIT_OFFSET (f), 1); + case GIMPLE_RETURN: + t = gimple_return_retval_ptr (stmt); + if (*t != NULL_TREE) + { + if (DECL_P (*t)) + { + tree ret_type = TREE_TYPE (*t); + if (sra_mode == SRA_MODE_EARLY_INTRA + && (TREE_CODE (ret_type) == UNION_TYPE + || TREE_CODE (ret_type) == QUAL_UNION_TYPE)) + disqualify_candidate (*t, + "Union in a return statement."); + else + bitmap_set_bit (retvals_bitmap, DECL_UID (*t)); + } + any |= scan_expr (t, &gsi, false, data); + } + break; - /* If we're past the selected word, we're fine. */ - if ((bit & alchk) < (fbit & alchk)) - continue; + case GIMPLE_ASSIGN: + assign_result = scan_assign (&stmt, &gsi, data); + any |= assign_result == SRA_SA_PROCESSED; + deleted = assign_result == SRA_SA_REMOVED; + if (handle_ssa_defs && assign_result != SRA_SA_REMOVED) + any |= handle_ssa_defs (stmt, data); + break; - if (host_integerp (DECL_SIZE (f), 1)) - fsize = tree_low_cst (DECL_SIZE (f), 1); - else - /* Assume a variable-sized field takes up all space till - the end of the word. ??? Endianness issues? */ - fsize = align - (fbit & alchk); + case GIMPLE_CALL: + /* Operands must be processed before the lhs. */ + for (i = 0; i < gimple_call_num_args (stmt); i++) + { + tree *argp = gimple_call_arg_ptr (stmt, i); + any |= scan_expr (argp, &gsi, false, data); + } - if ((fbit & alchk) < (bit & alchk)) - { - /* A large field might start at a previous word and - extend into the selected word. Exclude those - bits. ??? Endianness issues? */ - HOST_WIDE_INT diff = fbit + fsize - mbit; + if (gimple_call_lhs (stmt)) + { + tree *lhs_ptr = gimple_call_lhs_ptr (stmt); + if (!analysis_stage || + !disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, NULL)) + { + any |= scan_expr (lhs_ptr, &gsi, true, data); + if (handle_ssa_defs) + any |= handle_ssa_defs (stmt, data); + } + } + break; - if (diff <= 0) - continue; + case GIMPLE_ASM: + for (i = 0; i < gimple_asm_ninputs (stmt); i++) + { + tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i)); + any |= scan_expr (op, &gsi, false, data); + } + for (i = 0; i < gimple_asm_noutputs (stmt); i++) + { + tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i)); + any |= scan_expr (op, &gsi, true, data); + } - mbit += diff; - msize -= diff; + default: + if (analysis_stage) + walk_gimple_op (stmt, disqualify_all, NULL); + break; } - else + + if (any) { - /* Non-overlapping, great. */ - if (fbit + fsize <= mbit - || mbit + msize <= fbit) - continue; + ret = true; + bb_changed = true; - if (fbit <= mbit) + if (!analysis_stage) { - unsigned HOST_WIDE_INT diff = fbit + fsize - mbit; - mbit += diff; - msize -= diff; + update_stmt (stmt); + if (!stmt_could_throw_p (stmt)) + remove_stmt_from_eh_region (stmt); } - else if (fbit > mbit) - msize -= (mbit + msize - fbit); - else - gcc_unreachable (); + } + if (deleted) + bb_changed = true; + else + { + gsi_next (&gsi); + ret = true; } } - - bit = mbit; - size = msize; + if (!analysis_stage && bb_changed) + gimple_purge_dead_eh_edges (bb); } - /* Now we know the bit range we're interested in. Find the smallest - machine mode we can use to access it. */ - - for (mode = smallest_mode_for_size (size, MODE_INT); - ; - mode = GET_MODE_WIDER_MODE (mode)) - { - gcc_assert (mode != VOIDmode); - - alchk = GET_MODE_PRECISION (mode) - 1; - alchk = ~alchk; + return ret; +} - if ((bit & alchk) == ((bit + size - 1) & alchk)) - break; - } +/* Helper of QSORT function. There are pointers to accesses in the array. An + access is considered smaller than another if it has smaller offset or if the + offsets are the same but is size is bigger. */ - gcc_assert (~alchk < align); +static int +compare_access_positions (const void *a, const void *b) +{ + const access_p *fp1 = (const access_p *) a; + const access_p *fp2 = (const access_p *) b; + const access_p f1 = *fp1; + const access_p f2 = *fp2; - /* Create the field group as a single variable. */ + if (f1->offset != f2->offset) + return f1->offset < f2->offset ? -1 : 1; - /* We used to create a type for the mode above, but size turns - to be out not of mode-size. As we need a matching type - to build a BIT_FIELD_REF, use a nonstandard integer type as - fallback. */ - type = lang_hooks.types.type_for_size (size, 1); - if (!type || TYPE_PRECISION (type) != size) - type = build_nonstandard_integer_type (size, 1); - gcc_assert (type); - var = build3 (BIT_FIELD_REF, type, NULL_TREE, - bitsize_int (size), bitsize_int (bit)); + if (f1->size == f2->size) + return 0; + /* We want the bigger accesses first, thus the opposite operator in the next + line: */ + return f1->size > f2->size ? -1 : 1; +} - block = instantiate_missing_elements_1 (elt, var, type); - gcc_assert (block && block->is_scalar); - var = block->replacement; - block->in_bitfld_block = 2; +/* Append a name of the declaration to the name obstack. A helper function for + make_fancy_name. */ - /* Add the member fields to the group, such that they access - portions of the group variable. */ +static void +make_fancy_decl_name (tree decl) +{ + char buffer[32]; - for (f = first; f != TREE_CHAIN (prev); f = TREE_CHAIN (f)) + tree name = DECL_NAME (decl); + if (name) + obstack_grow (&name_obstack, IDENTIFIER_POINTER (name), + IDENTIFIER_LENGTH (name)); + else { - tree field_type = canon_type_for_field (f, elt->element); - struct sra_elt *fld = lookup_element (block, f, field_type, INSERT); - - gcc_assert (fld && fld->is_scalar && !fld->replacement); - - fld->replacement = fold_build3 (BIT_FIELD_REF, field_type, var, - bitsize_int (TYPE_PRECISION (field_type)), - bitsize_int - ((TREE_INT_CST_LOW (DECL_FIELD_OFFSET (f)) - * BITS_PER_UNIT - + (TREE_INT_CST_LOW - (DECL_FIELD_BIT_OFFSET (f))) - - (TREE_INT_CST_LOW - (TREE_OPERAND (block->element, 2)))) - & ~alchk)); - fld->in_bitfld_block = 1; + sprintf (buffer, "D%u", DECL_UID (decl)); + obstack_grow (&name_obstack, buffer, strlen (buffer)); } - - return prev; } +/* Helper for make_fancy_name. */ + static void -instantiate_missing_elements (struct sra_elt *elt) +make_fancy_name_1 (tree expr) { - tree type = elt->type; + char buffer[32]; + tree index; - switch (TREE_CODE (type)) + if (DECL_P (expr)) { - case RECORD_TYPE: - { - tree f; - for (f = TYPE_FIELDS (type); f ; f = TREE_CHAIN (f)) - if (TREE_CODE (f) == FIELD_DECL) - { - tree last = try_instantiate_multiple_fields (elt, f); - - if (last != f) - { - f = last; - continue; - } - - instantiate_missing_elements_1 (elt, f, - canon_type_for_field - (f, elt->element)); - } - break; - } - - case ARRAY_TYPE: - { - tree i, max, subtype; - - i = TYPE_MIN_VALUE (TYPE_DOMAIN (type)); - max = TYPE_MAX_VALUE (TYPE_DOMAIN (type)); - subtype = TREE_TYPE (type); + make_fancy_decl_name (expr); + return; + } - while (1) - { - instantiate_missing_elements_1 (elt, i, subtype); - if (tree_int_cst_equal (i, max)) - break; - i = int_const_binop (PLUS_EXPR, i, integer_one_node, true); - } + switch (TREE_CODE (expr)) + { + case COMPONENT_REF: + make_fancy_name_1 (TREE_OPERAND (expr, 0)); + obstack_1grow (&name_obstack, '$'); + make_fancy_decl_name (TREE_OPERAND (expr, 1)); + break; + case ARRAY_REF: + make_fancy_name_1 (TREE_OPERAND (expr, 0)); + obstack_1grow (&name_obstack, '$'); + /* Arrays with only one element may not have a constant as their + index. */ + index = TREE_OPERAND (expr, 1); + if (TREE_CODE (index) != INTEGER_CST) break; - } + sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index)); + obstack_grow (&name_obstack, buffer, strlen (buffer)); - case COMPLEX_TYPE: - type = TREE_TYPE (type); - instantiate_missing_elements_1 (elt, integer_zero_node, type); - instantiate_missing_elements_1 (elt, integer_one_node, type); break; + case BIT_FIELD_REF: + case REALPART_EXPR: + case IMAGPART_EXPR: + gcc_unreachable (); /* we treat these as scalars. */ + break; default: - gcc_unreachable (); + break; } } -/* Return true if there is only one non aggregate field in the record, TYPE. - Return false otherwise. */ +/* Create a human readable name for replacement variable of ACCESS. */ -static bool -single_scalar_field_in_record_p (tree type) +static char * +make_fancy_name (tree expr) { - int num_fields = 0; - tree field; - if (TREE_CODE (type) != RECORD_TYPE) - return false; - - for (field = TYPE_FIELDS (type); field; field = TREE_CHAIN (field)) - if (TREE_CODE (field) == FIELD_DECL) - { - num_fields++; - - if (num_fields == 2) - return false; - - if (AGGREGATE_TYPE_P (TREE_TYPE (field))) - return false; - } - - return true; -} - -/* Make one pass across an element tree deciding whether to perform block - or element copies. If we decide on element copies, instantiate all - elements. Return true if there are any instantiated sub-elements. */ + make_fancy_name_1 (expr); + obstack_1grow (&name_obstack, '\0'); + return XOBFINISH (&name_obstack, char *); +} + +/* Helper function for build_ref_for_offset. */ static bool -decide_block_copy (struct sra_elt *elt) +build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset, + tree exp_type) { - struct sra_elt *c; - bool any_inst; - - /* We shouldn't be invoked on groups of sub-elements as they must - behave like their parent as far as block copy is concerned. */ - gcc_assert (!elt->is_group); - - /* If scalarization is disabled, respect it. */ - if (elt->cannot_scalarize) + while (1) { - elt->use_block_copy = 1; + tree fld; + tree tr_size, index; + HOST_WIDE_INT el_size; - if (dump_file) - { - fputs ("Scalarization disabled for ", dump_file); - dump_sra_elt_name (dump_file, elt); - fputc ('\n', dump_file); - } + if (offset == 0 && exp_type + && useless_type_conversion_p (exp_type, type)) + return true; - /* Disable scalarization of sub-elements */ - for (c = elt->children; c; c = c->sibling) + switch (TREE_CODE (type)) { - c->cannot_scalarize = 1; - decide_block_copy (c); - } + case UNION_TYPE: + case QUAL_UNION_TYPE: + case RECORD_TYPE: + /* Some ADA records are half-unions, treat all of them the same. */ + for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) + { + HOST_WIDE_INT pos, size; + tree expr, *expr_ptr; - /* Groups behave like their parent. */ - for (c = elt->groups; c; c = c->sibling) - { - c->cannot_scalarize = 1; - c->use_block_copy = 1; - } + if (TREE_CODE (fld) != FIELD_DECL) + continue; - return false; - } + pos = int_bit_position (fld); + gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0); + size = tree_low_cst (DECL_SIZE (fld), 1); + if (pos > offset || (pos + size) <= offset) + continue; - /* Don't decide if we've no uses and no groups. */ - if (elt->n_uses == 0 && elt->n_copies == 0 && elt->groups == NULL) - ; - - else if (!elt->is_scalar) - { - tree size_tree = TYPE_SIZE_UNIT (elt->type); - bool use_block_copy = true; - - /* Tradeoffs for COMPLEX types pretty much always make it better - to go ahead and split the components. */ - if (TREE_CODE (elt->type) == COMPLEX_TYPE) - use_block_copy = false; - - /* Don't bother trying to figure out the rest if the structure is - so large we can't do easy arithmetic. This also forces block - copies for variable sized structures. */ - else if (host_integerp (size_tree, 1)) - { - unsigned HOST_WIDE_INT full_size, inst_size = 0; - unsigned int max_size, max_count, inst_count, full_count; - - /* If the sra-max-structure-size parameter is 0, then the - user has not overridden the parameter and we can choose a - sensible default. */ - max_size = SRA_MAX_STRUCTURE_SIZE - ? SRA_MAX_STRUCTURE_SIZE - : MOVE_RATIO (optimize_function_for_speed_p (cfun)) * UNITS_PER_WORD; - max_count = SRA_MAX_STRUCTURE_COUNT - ? SRA_MAX_STRUCTURE_COUNT - : MOVE_RATIO (optimize_function_for_speed_p (cfun)); - - full_size = tree_low_cst (size_tree, 1); - full_count = count_type_elements (elt->type, false); - inst_count = sum_instantiated_sizes (elt, &inst_size); - - /* If there is only one scalar field in the record, don't block copy. */ - if (single_scalar_field_in_record_p (elt->type)) - use_block_copy = false; - - /* ??? What to do here. If there are two fields, and we've only - instantiated one, then instantiating the other is clearly a win. - If there are a large number of fields then the size of the copy - is much more of a factor. */ - - /* If the structure is small, and we've made copies, go ahead - and instantiate, hoping that the copies will go away. */ - if (full_size <= max_size - && (full_count - inst_count) <= max_count - && elt->n_copies > elt->n_uses) - use_block_copy = false; - else if (inst_count * 100 >= full_count * SRA_FIELD_STRUCTURE_RATIO - && inst_size * 100 >= full_size * SRA_FIELD_STRUCTURE_RATIO) - use_block_copy = false; - - /* In order to avoid block copy, we have to be able to instantiate - all elements of the type. See if this is possible. */ - if (!use_block_copy - && (!can_completely_scalarize_p (elt) - || !type_can_instantiate_all_elements (elt->type))) - use_block_copy = true; - } - - elt->use_block_copy = use_block_copy; - - /* Groups behave like their parent. */ - for (c = elt->groups; c; c = c->sibling) - c->use_block_copy = use_block_copy; + if (res) + { + expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld, + NULL_TREE); + expr_ptr = &expr; + } + else + expr_ptr = NULL; + if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld), + offset - pos, exp_type)) + { + if (res) + *res = expr; + return true; + } + } + return false; - if (dump_file) - { - fprintf (dump_file, "Using %s for ", - use_block_copy ? "block-copy" : "element-copy"); - dump_sra_elt_name (dump_file, elt); - fputc ('\n', dump_file); - } + case ARRAY_TYPE: + tr_size = TYPE_SIZE (TREE_TYPE (type)); + if (!tr_size || !host_integerp (tr_size, 1)) + return false; + el_size = tree_low_cst (tr_size, 1); + + index = build_int_cst (TYPE_DOMAIN (type), offset / el_size); + if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) + index = int_const_binop (PLUS_EXPR, index, + TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0); + if (res) + *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE, + NULL_TREE); + offset = offset % el_size; + type = TREE_TYPE (type); + break; + + default: + if (offset != 0) + return false; - if (!use_block_copy) - { - instantiate_missing_elements (elt); - return true; + if (exp_type) + return false; + else + return true; } } - - any_inst = elt->replacement != NULL; - - for (c = elt->children; c ; c = c->sibling) - any_inst |= decide_block_copy (c); - - return any_inst; } -/* Entry point to phase 3. Instantiate scalar replacement variables. */ - -static void -decide_instantiations (void) -{ - unsigned int i; - bool cleared_any; - bitmap_head done_head; - bitmap_iterator bi; - - /* We cannot clear bits from a bitmap we're iterating over, - so save up all the bits to clear until the end. */ - bitmap_initialize (&done_head, &bitmap_default_obstack); - cleared_any = false; +/* Construct an expression that would reference a part of aggregate *EXPR of + type TYPE at the given OFFSET of the type EXP_TYPE. If EXPR is NULL, the + function only determines whether it can build such a reference without + actually doing it. - EXECUTE_IF_SET_IN_BITMAP (sra_candidates, 0, i, bi) - { - tree var = referenced_var (i); - struct sra_elt *elt = lookup_element (NULL, var, NULL, NO_INSERT); - if (elt) - { - decide_instantiation_1 (elt, 0, 0); - if (!decide_block_copy (elt)) - elt = NULL; - } - if (!elt) - { - bitmap_set_bit (&done_head, i); - cleared_any = true; - } - } + FIXME: Eventually this should be replaced with + maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a + minor rewrite of fold_stmt. + */ - if (cleared_any) +static bool +build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset, + tree exp_type, bool allow_ptr) +{ + if (allow_ptr && POINTER_TYPE_P (type)) { - bitmap_and_compl_into (sra_candidates, &done_head); - bitmap_and_compl_into (needs_copy_in, &done_head); + type = TREE_TYPE (type); + if (expr) + *expr = fold_build1 (INDIRECT_REF, type, *expr); } - bitmap_clear (&done_head); - - mark_set_for_renaming (sra_candidates); - if (dump_file) - fputc ('\n', dump_file); + return build_ref_for_offset_1 (expr, type, offset, exp_type); } -\f -/* Phase Four: Update the function to match the replacements created. */ +/* The very first phase of intraprocedural SRA. It marks in candidate_bitmap + those with type which is suitable for scalarization. */ -/* Mark all the variables in virtual operands in all the statements in - LIST for renaming. */ - -static void -mark_all_v_defs_seq (gimple_seq seq) +static bool +find_var_candidates (void) { - gimple_stmt_iterator gsi; + tree var, type; + referenced_var_iterator rvi; + bool ret = false; - for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi)) - update_stmt_if_modified (gsi_stmt (gsi)); -} + FOR_EACH_REFERENCED_VAR (var, rvi) + { + if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL) + continue; + type = TREE_TYPE (var); + + if (!AGGREGATE_TYPE_P (type) + || needs_to_live_in_memory (var) + || TREE_THIS_VOLATILE (var) + || !COMPLETE_TYPE_P (type) + || !host_integerp (TYPE_SIZE (type), 1) + || tree_low_cst (TYPE_SIZE (type), 1) == 0 + || type_internals_preclude_sra_p (type)) + continue; -/* Mark every replacement under ELT with TREE_NO_WARNING. */ + bitmap_set_bit (candidate_bitmap, DECL_UID (var)); -static void -mark_no_warning (struct sra_elt *elt) -{ - if (!elt->all_no_warning) - { - if (elt->replacement) - TREE_NO_WARNING (elt->replacement) = 1; - else + if (dump_file) { - struct sra_elt *c; - FOR_EACH_ACTUAL_CHILD (c, elt) - mark_no_warning (c); + fprintf (dump_file, "Candidate (%d): ", DECL_UID (var)); + print_generic_expr (dump_file, var, 0); + fprintf (dump_file, "\n"); } - elt->all_no_warning = true; + ret = true; } + + return ret; } -/* Build a single level component reference to ELT rooted at BASE. */ +/* Return true if TYPE should be considered a scalar type by SRA. */ -static tree -generate_one_element_ref (struct sra_elt *elt, tree base) +static bool +is_sra_scalar_type (tree type) { - switch (TREE_CODE (TREE_TYPE (base))) - { - case RECORD_TYPE: - { - tree field = elt->element; - - /* We can't test elt->in_bitfld_block here because, when this is - called from instantiate_element, we haven't set this field - yet. */ - if (TREE_CODE (field) == BIT_FIELD_REF) - { - tree ret = unshare_expr (field); - TREE_OPERAND (ret, 0) = base; - return ret; - } - - /* Watch out for compatible records with differing field lists. */ - if (DECL_FIELD_CONTEXT (field) != TYPE_MAIN_VARIANT (TREE_TYPE (base))) - field = find_compatible_field (TREE_TYPE (base), field); - - return build3 (COMPONENT_REF, elt->type, base, field, NULL); - } - - case ARRAY_TYPE: - if (TREE_CODE (elt->element) == RANGE_EXPR) - return build4 (ARRAY_RANGE_REF, elt->type, base, - TREE_OPERAND (elt->element, 0), NULL, NULL); - else - return build4 (ARRAY_REF, elt->type, base, elt->element, NULL, NULL); + enum tree_code code = TREE_CODE (type); + return (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type) + || FIXED_POINT_TYPE_P (type) || POINTER_TYPE_P (type) + || code == VECTOR_TYPE || code == COMPLEX_TYPE + || code == OFFSET_TYPE); +} - case COMPLEX_TYPE: - if (elt->element == integer_zero_node) - return build1 (REALPART_EXPR, elt->type, base); - else - return build1 (IMAGPART_EXPR, elt->type, base); - default: - gcc_unreachable (); - } -} +/* Sort all accesses for the given variable, check for partial overlaps and + return NULL if there are any. If there are none, pick a representative for + each combination of offset and size and create a linked list out of them. + Return the pointer to the first representative and make sure it is the first + one in the vector of accesses. */ + +static struct access * +sort_and_splice_var_accesses (tree var) +{ + int i, j, access_count; + struct access *res, **prev_acc_ptr = &res; + VEC (access_p, heap) *access_vec; + bool first = true; + HOST_WIDE_INT low = -1, high = 0; -/* Build a full component reference to ELT rooted at its native variable. */ + access_vec = get_base_access_vector (var); + if (!access_vec) + return NULL; + access_count = VEC_length (access_p, access_vec); -static tree -generate_element_ref (struct sra_elt *elt) -{ - if (elt->parent) - return generate_one_element_ref (elt, generate_element_ref (elt->parent)); - else - return elt->element; -} + /* Sort by <OFFSET, SIZE>. */ + qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p), + compare_access_positions); -/* Return true if BF is a bit-field that we can handle like a scalar. */ + i = 0; + while (i < access_count) + { + struct access *access = VEC_index (access_p, access_vec, i); + bool modification = access->write; + bool grp_read = !access->write; + bool grp_bfr_lhs = access->grp_bfr_lhs; + bool first_scalar = is_sra_scalar_type (access->type); + bool unscalarizable_region = access->grp_unscalarizable_region; -static bool -scalar_bitfield_p (tree bf) -{ - return (TREE_CODE (bf) == BIT_FIELD_REF - && (is_gimple_reg (TREE_OPERAND (bf, 0)) - || (TYPE_MODE (TREE_TYPE (TREE_OPERAND (bf, 0))) != BLKmode - && (!TREE_SIDE_EFFECTS (TREE_OPERAND (bf, 0)) - || (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE - (TREE_OPERAND (bf, 0)))) - <= BITS_PER_WORD))))); -} - -/* Create an assignment statement from SRC to DST. */ - -static gimple_seq -sra_build_assignment (tree dst, tree src) -{ - gimple stmt; - gimple_seq seq = NULL, seq2 = NULL; - /* Turning BIT_FIELD_REFs into bit operations enables other passes - to do a much better job at optimizing the code. - From dst = BIT_FIELD_REF <var, sz, off> we produce - - SR.1 = (scalar type) var; - SR.2 = SR.1 >> off; - SR.3 = SR.2 & ((1 << sz) - 1); - ... possible sign extension of SR.3 ... - dst = (destination type) SR.3; - */ - if (scalar_bitfield_p (src)) - { - tree var, shift, width; - tree utype, stype; - bool unsignedp = (INTEGRAL_TYPE_P (TREE_TYPE (src)) - ? TYPE_UNSIGNED (TREE_TYPE (src)) : true); - struct gimplify_ctx gctx; - - var = TREE_OPERAND (src, 0); - width = TREE_OPERAND (src, 1); - /* The offset needs to be adjusted to a right shift quantity - depending on the endianness. */ - if (BYTES_BIG_ENDIAN) + if (first || access->offset >= high) { - tree tmp = size_binop (PLUS_EXPR, width, TREE_OPERAND (src, 2)); - shift = size_binop (MINUS_EXPR, TYPE_SIZE (TREE_TYPE (var)), tmp); + first = false; + low = access->offset; + high = access->offset + access->size; } + else if (access->offset > low && access->offset + access->size > high) + return NULL; else - shift = TREE_OPERAND (src, 2); + gcc_assert (access->offset >= low + && access->offset + access->size <= high); - /* In weird cases we have non-integral types for the source or - destination object. - ??? For unknown reasons we also want an unsigned scalar type. */ - stype = TREE_TYPE (var); - if (!INTEGRAL_TYPE_P (stype)) - stype = lang_hooks.types.type_for_size (TREE_INT_CST_LOW - (TYPE_SIZE (stype)), 1); - else if (!TYPE_UNSIGNED (stype)) - stype = unsigned_type_for (stype); - - utype = TREE_TYPE (dst); - if (!INTEGRAL_TYPE_P (utype)) - utype = lang_hooks.types.type_for_size (TREE_INT_CST_LOW - (TYPE_SIZE (utype)), 1); - else if (!TYPE_UNSIGNED (utype)) - utype = unsigned_type_for (utype); - - /* Convert the base var of the BIT_FIELD_REF to the scalar type - we use for computation if we cannot use it directly. */ - if (INTEGRAL_TYPE_P (TREE_TYPE (var))) - var = fold_convert (stype, var); - else - var = fold_build1 (VIEW_CONVERT_EXPR, stype, var); + j = i + 1; + while (j < access_count) + { + struct access *ac2 = VEC_index (access_p, access_vec, j); + if (ac2->offset != access->offset || ac2->size != access->size) + break; + modification |= ac2->write; + grp_read |= !ac2->write; + grp_bfr_lhs |= ac2->grp_bfr_lhs; + unscalarizable_region |= ac2->grp_unscalarizable_region; + relink_to_new_repr (access, ac2); - if (!integer_zerop (shift)) - var = fold_build2 (RSHIFT_EXPR, stype, var, shift); + /* If one of the equivalent accesses is scalar, use it as a + representative (this happens when when there is for example on a + single scalar field in a structure). */ + if (!first_scalar && is_sra_scalar_type (ac2->type)) + { + struct access tmp_acc; + first_scalar = true; - /* If we need a masking operation, produce one. */ - if (TREE_INT_CST_LOW (width) == TYPE_PRECISION (stype)) - unsignedp = true; - else - { - tree one = build_int_cst_wide (stype, 1, 0); - tree mask = int_const_binop (LSHIFT_EXPR, one, width, 0); - mask = int_const_binop (MINUS_EXPR, mask, one, 0); - var = fold_build2 (BIT_AND_EXPR, stype, var, mask); + memcpy (&tmp_acc, ac2, sizeof (struct access)); + memcpy (ac2, access, sizeof (struct access)); + memcpy (access, &tmp_acc, sizeof (struct access)); + } + ac2->group_representative = access; + j++; } - /* After shifting and masking, convert to the target type. */ - var = fold_convert (utype, var); + i = j; - /* Perform sign extension, if required. - ??? This should never be necessary. */ - if (!unsignedp) - { - tree signbit = int_const_binop (LSHIFT_EXPR, - build_int_cst_wide (utype, 1, 0), - size_binop (MINUS_EXPR, width, - bitsize_int (1)), 0); + access->group_representative = access; + access->grp_write = modification; + access->grp_read = grp_read; + access->grp_bfr_lhs = grp_bfr_lhs; + access->grp_unscalarizable_region = unscalarizable_region; + if (access->first_link) + add_access_to_work_queue (access); - var = fold_build2 (BIT_XOR_EXPR, utype, var, signbit); - var = fold_build2 (MINUS_EXPR, utype, var, signbit); - } + *prev_acc_ptr = access; + prev_acc_ptr = &access->next_grp; + } - /* fold_build3 (BIT_FIELD_REF, ...) sometimes returns a cast. */ - STRIP_NOPS (dst); + gcc_assert (res == VEC_index (access_p, access_vec, 0)); + return res; +} - /* Finally, move and convert to the destination. */ - if (INTEGRAL_TYPE_P (TREE_TYPE (dst))) - var = fold_convert (TREE_TYPE (dst), var); - else - var = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (dst), var); +/* Create a variable for the given ACCESS which determines the type, name and a + few other properties. Return the variable declaration and store it also to + ACCESS->replacement. */ - push_gimplify_context (&gctx); - gctx.allow_rhs_cond_expr = true; +static tree +create_access_replacement (struct access *access) +{ + tree repl; - gimplify_assign (dst, var, &seq); + repl = make_rename_temp (access->type, "SR"); + get_var_ann (repl); + add_referenced_var (repl); - if (gimple_referenced_vars (cfun)) - for (var = gctx.temps; var; var = TREE_CHAIN (var)) - { - add_referenced_var (var); - mark_sym_for_renaming (var); - } - pop_gimplify_context (NULL); + DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); + DECL_ARTIFICIAL (repl) = 1; - return seq; - } + if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) + { + char *pretty_name = make_fancy_name (access->expr); - /* fold_build3 (BIT_FIELD_REF, ...) sometimes returns a cast. */ - if (CONVERT_EXPR_P (dst)) + DECL_NAME (repl) = get_identifier (pretty_name); + obstack_free (&name_obstack, pretty_name); + + SET_DECL_DEBUG_EXPR (repl, access->expr); + DECL_DEBUG_EXPR_IS_FROM (repl) = 1; + DECL_IGNORED_P (repl) = 0; + TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base); + } + else { - STRIP_NOPS (dst); - src = fold_convert (TREE_TYPE (dst), src); + DECL_IGNORED_P (repl) = 1; + TREE_NO_WARNING (repl) = 1; } - /* It was hoped that we could perform some type sanity checking - here, but since front-ends can emit accesses of fields in types - different from their nominal types and copy structures containing - them as a whole, we'd have to handle such differences here. - Since such accesses under different types require compatibility - anyway, there's little point in making tests and/or adding - conversions to ensure the types of src and dst are the same. - So we just assume type differences at this point are ok. - The only exception we make here are pointer types, which can be different - in e.g. structurally equal, but non-identical RECORD_TYPEs. */ - else if (POINTER_TYPE_P (TREE_TYPE (dst)) - && !useless_type_conversion_p (TREE_TYPE (dst), TREE_TYPE (src))) - src = fold_convert (TREE_TYPE (dst), src); - /* ??? Only call the gimplifier if we need to. Otherwise we may - end up substituting with DECL_VALUE_EXPR - see PR37380. */ - if (!handled_component_p (src) - && !SSA_VAR_P (src)) + if (access->grp_bfr_lhs) + DECL_GIMPLE_REG_P (repl) = 0; + + if (dump_file) { - src = force_gimple_operand (src, &seq2, false, NULL_TREE); - gimple_seq_add_seq (&seq, seq2); + fprintf (dump_file, "Created a replacement for "); + print_generic_expr (dump_file, access->base, 0); + fprintf (dump_file, " offset: %u, size: %u: ", + (unsigned) access->offset, (unsigned) access->size); + print_generic_expr (dump_file, repl, 0); + fprintf (dump_file, "\n"); } - stmt = gimple_build_assign (dst, src); - gimple_seq_add_stmt (&seq, stmt); - return seq; -} -/* BIT_FIELD_REFs must not be shared. sra_build_elt_assignment() - takes care of assignments, but we must create copies for uses. */ -#define REPLDUP(t) (TREE_CODE (t) != BIT_FIELD_REF ? (t) : unshare_expr (t)) + return repl; +} -/* Emit an assignment from SRC to DST, but if DST is a scalarizable - BIT_FIELD_REF, turn it into bit operations. */ +/* Return ACCESS scalar replacement, create it if it does not exist yet. */ -static gimple_seq -sra_build_bf_assignment (tree dst, tree src) +static inline tree +get_access_replacement (struct access *access) { - tree var, type, utype, tmp, tmp2, tmp3; - gimple_seq seq; - gimple stmt; - tree cst, cst2, mask; - tree minshift, maxshift; - - if (TREE_CODE (dst) != BIT_FIELD_REF) - return sra_build_assignment (dst, src); + gcc_assert (access->grp_to_be_replaced); - var = TREE_OPERAND (dst, 0); + if (access->replacement_decl) + return access->replacement_decl; - if (!scalar_bitfield_p (dst)) - return sra_build_assignment (REPLDUP (dst), src); + access->replacement_decl = create_access_replacement (access); + return access->replacement_decl; +} - seq = NULL; +/* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the + linked list along the way. Stop when *ACCESS is NULL or the access pointed + to it is not "within" the root. */ - cst = fold_convert (bitsizetype, TREE_OPERAND (dst, 2)); - cst2 = size_binop (PLUS_EXPR, - fold_convert (bitsizetype, TREE_OPERAND (dst, 1)), - cst); +static void +build_access_subtree (struct access **access) +{ + struct access *root = *access, *last_child = NULL; + HOST_WIDE_INT limit = root->offset + root->size; - if (BYTES_BIG_ENDIAN) + *access = (*access)->next_grp; + while (*access && (*access)->offset + (*access)->size <= limit) { - maxshift = size_binop (MINUS_EXPR, TYPE_SIZE (TREE_TYPE (var)), cst); - minshift = size_binop (MINUS_EXPR, TYPE_SIZE (TREE_TYPE (var)), cst2); + if (!last_child) + root->first_child = *access; + else + last_child->next_sibling = *access; + last_child = *access; + + build_access_subtree (access); } - else +} + +/* Build a tree of access representatives, ACCESS is the pointer to the first + one, others are linked in a list by the next_grp field. Decide about scalar + replacements on the way, return true iff any are to be created. */ + +static void +build_access_trees (struct access *access) +{ + while (access) { - maxshift = cst2; - minshift = cst; + struct access *root = access; + + build_access_subtree (&access); + root->next_grp = access; } +} - type = TREE_TYPE (var); - if (!INTEGRAL_TYPE_P (type)) - type = lang_hooks.types.type_for_size - (TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (var))), 1); - if (TYPE_UNSIGNED (type)) - utype = type; - else - utype = unsigned_type_for (type); +/* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when + both seeming beneficial and when ALLOW_REPLACEMENTS allows it. Also set + all sorts of access flags appropriately along the way, notably always ser + grp_read when MARK_READ is true and grp_write when MARK_WRITE is true. */ - mask = build_int_cst_wide (utype, 1, 0); - if (TREE_INT_CST_LOW (maxshift) == TYPE_PRECISION (utype)) - cst = build_int_cst_wide (utype, 0, 0); - else - cst = int_const_binop (LSHIFT_EXPR, mask, maxshift, true); - if (integer_zerop (minshift)) - cst2 = mask; - else - cst2 = int_const_binop (LSHIFT_EXPR, mask, minshift, true); - mask = int_const_binop (MINUS_EXPR, cst, cst2, true); - mask = fold_build1 (BIT_NOT_EXPR, utype, mask); - - if (TYPE_MAIN_VARIANT (utype) != TYPE_MAIN_VARIANT (TREE_TYPE (var)) - && !integer_zerop (mask)) - { - tmp = var; - if (!is_gimple_variable (tmp)) - tmp = unshare_expr (var); - else - TREE_NO_WARNING (var) = true; +static bool +analyze_access_subtree (struct access *root, bool allow_replacements, + bool mark_read, bool mark_write) +{ + struct access *child; + HOST_WIDE_INT limit = root->offset + root->size; + HOST_WIDE_INT covered_to = root->offset; + bool scalar = is_sra_scalar_type (root->type); + bool hole = false, sth_created = false; - tmp2 = make_rename_temp (utype, "SR"); + if (mark_read) + root->grp_read = true; + else if (root->grp_read) + mark_read = true; - if (INTEGRAL_TYPE_P (TREE_TYPE (var))) - tmp = fold_convert (utype, tmp); - else - tmp = fold_build1 (VIEW_CONVERT_EXPR, utype, tmp); + if (mark_write) + root->grp_write = true; + else if (root->grp_write) + mark_write = true; - stmt = gimple_build_assign (tmp2, tmp); - gimple_seq_add_stmt (&seq, stmt); - } - else - tmp2 = var; + if (root->grp_unscalarizable_region) + allow_replacements = false; - if (!integer_zerop (mask)) + for (child = root->first_child; child; child = child->next_sibling) { - tmp = make_rename_temp (utype, "SR"); - stmt = gimple_build_assign (tmp, fold_build2 (BIT_AND_EXPR, utype, - tmp2, mask)); - gimple_seq_add_stmt (&seq, stmt); - } - else - tmp = mask; + if (!hole && child->offset < covered_to) + hole = true; + else + covered_to += child->size; - if (is_gimple_reg (src) && INTEGRAL_TYPE_P (TREE_TYPE (src))) - tmp2 = src; - else if (INTEGRAL_TYPE_P (TREE_TYPE (src))) - { - gimple_seq tmp_seq; - tmp2 = make_rename_temp (TREE_TYPE (src), "SR"); - tmp_seq = sra_build_assignment (tmp2, src); - gimple_seq_add_seq (&seq, tmp_seq); - } - else - { - gimple_seq tmp_seq; - tmp2 = make_rename_temp - (lang_hooks.types.type_for_size - (TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (src))), - 1), "SR"); - tmp_seq = sra_build_assignment (tmp2, fold_build1 (VIEW_CONVERT_EXPR, - TREE_TYPE (tmp2), src)); - gimple_seq_add_seq (&seq, tmp_seq); + sth_created |= analyze_access_subtree (child, + allow_replacements && !scalar, + mark_read, mark_write); + + root->grp_unscalarized_data |= child->grp_unscalarized_data; + hole |= !child->grp_covered; } - if (!TYPE_UNSIGNED (TREE_TYPE (tmp2))) + if (allow_replacements && scalar && !root->first_child) { - gimple_seq tmp_seq; - tree ut = unsigned_type_for (TREE_TYPE (tmp2)); - tmp3 = make_rename_temp (ut, "SR"); - tmp2 = fold_convert (ut, tmp2); - tmp_seq = sra_build_assignment (tmp3, tmp2); - gimple_seq_add_seq (&seq, tmp_seq); - - tmp2 = fold_build1 (BIT_NOT_EXPR, utype, mask); - tmp2 = int_const_binop (RSHIFT_EXPR, tmp2, minshift, true); - tmp2 = fold_convert (ut, tmp2); - tmp2 = fold_build2 (BIT_AND_EXPR, ut, tmp3, tmp2); - - if (tmp3 != tmp2) + if (dump_file) { - tmp3 = make_rename_temp (ut, "SR"); - tmp_seq = sra_build_assignment (tmp3, tmp2); - gimple_seq_add_seq (&seq, tmp_seq); + fprintf (dump_file, "Marking "); + print_generic_expr (dump_file, root->base, 0); + fprintf (dump_file, " offset: %u, size: %u: ", + (unsigned) root->offset, (unsigned) root->size); + fprintf (dump_file, " to be replaced.\n"); } - tmp2 = tmp3; + root->grp_to_be_replaced = 1; + sth_created = true; + hole = false; } + else if (covered_to < limit) + hole = true; - if (TYPE_MAIN_VARIANT (TREE_TYPE (tmp2)) != TYPE_MAIN_VARIANT (utype)) + if (sth_created && !hole) { - gimple_seq tmp_seq; - tmp3 = make_rename_temp (utype, "SR"); - tmp2 = fold_convert (utype, tmp2); - tmp_seq = sra_build_assignment (tmp3, tmp2); - gimple_seq_add_seq (&seq, tmp_seq); - tmp2 = tmp3; + root->grp_covered = 1; + return true; } + if (root->grp_write || TREE_CODE (root->base) == PARM_DECL) + root->grp_unscalarized_data = 1; /* not covered and written to */ + if (sth_created) + return true; + return false; +} - if (!integer_zerop (minshift)) +/* Analyze all access trees linked by next_grp by the means of + analyze_access_subtree. */ +static bool +analyze_access_trees (struct access *access) +{ + bool ret = false; + + while (access) { - tmp3 = make_rename_temp (utype, "SR"); - stmt = gimple_build_assign (tmp3, fold_build2 (LSHIFT_EXPR, utype, - tmp2, minshift)); - gimple_seq_add_stmt (&seq, stmt); - tmp2 = tmp3; + if (analyze_access_subtree (access, true, false, false)) + ret = true; + access = access->next_grp; } - if (utype != TREE_TYPE (var)) - tmp3 = make_rename_temp (utype, "SR"); - else - tmp3 = var; - stmt = gimple_build_assign (tmp3, fold_build2 (BIT_IOR_EXPR, utype, - tmp, tmp2)); - gimple_seq_add_stmt (&seq, stmt); + return ret; +} + +/* Return true iff a potential new child of LACC at offset OFFSET and with size + SIZE would conflict with an already existing one. If exactly such a child + already exists in LACC, store a pointer to it in EXACT_MATCH. */ + +static bool +child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, + HOST_WIDE_INT size, struct access **exact_match) +{ + struct access *child; - if (tmp3 != var) + for (child = lacc->first_child; child; child = child->next_sibling) { - if (TREE_TYPE (var) == type) - stmt = gimple_build_assign (var, fold_convert (type, tmp3)); - else - stmt = gimple_build_assign (var, fold_build1 (VIEW_CONVERT_EXPR, - TREE_TYPE (var), tmp3)); - gimple_seq_add_stmt (&seq, stmt); + if (child->offset == norm_offset && child->size == size) + { + *exact_match = child; + return true; + } + + if (child->offset < norm_offset + size + && child->offset + child->size > norm_offset) + return true; } - return seq; + return false; } -/* Expand an assignment of SRC to the scalarized representation of - ELT. If it is a field group, try to widen the assignment to cover - the full variable. */ +/* Set the expr of TARGET to one just like MODEL but with is own base at the + bottom of the handled components. */ -static gimple_seq -sra_build_elt_assignment (struct sra_elt *elt, tree src) +static void +duplicate_expr_for_different_base (struct access *target, + struct access *model) { - tree dst = elt->replacement; - tree var, tmp, cst, cst2; - gimple stmt; - gimple_seq seq; + tree t, expr = unshare_expr (model->expr); - if (TREE_CODE (dst) != BIT_FIELD_REF - || !elt->in_bitfld_block) - return sra_build_assignment (REPLDUP (dst), src); + gcc_assert (handled_component_p (expr)); + t = expr; + while (handled_component_p (TREE_OPERAND (t, 0))) + t = TREE_OPERAND (t, 0); + gcc_assert (TREE_OPERAND (t, 0) == model->base); + TREE_OPERAND (t, 0) = target->base; - var = TREE_OPERAND (dst, 0); - - /* Try to widen the assignment to the entire variable. - We need the source to be a BIT_FIELD_REF as well, such that, for - BIT_FIELD_REF<d,sz,dp> = BIT_FIELD_REF<s,sz,sp>, - by design, conditions are met such that we can turn it into - d = BIT_FIELD_REF<s,dw,sp-dp>. */ - if (elt->in_bitfld_block == 2 - && TREE_CODE (src) == BIT_FIELD_REF) - { - tmp = src; - cst = TYPE_SIZE (TREE_TYPE (var)); - cst2 = size_binop (MINUS_EXPR, TREE_OPERAND (src, 2), - TREE_OPERAND (dst, 2)); + target->expr = expr; +} - src = TREE_OPERAND (src, 0); - /* Avoid full-width bit-fields. */ - if (integer_zerop (cst2) - && tree_int_cst_equal (cst, TYPE_SIZE (TREE_TYPE (src)))) - { - if (INTEGRAL_TYPE_P (TREE_TYPE (src)) - && !TYPE_UNSIGNED (TREE_TYPE (src))) - src = fold_convert (unsigned_type_for (TREE_TYPE (src)), src); +/* Create a new child access of PARENT, with all properties just like MODEL + except for its offset and with its grp_write false and grp_read true. + Return the new access. Note that this access is created long after all + splicing and sorting, it's not located in any access vector and is + automatically a representative of its group. */ - /* If a single conversion won't do, we'll need a statement - list. */ - if (TYPE_MAIN_VARIANT (TREE_TYPE (var)) - != TYPE_MAIN_VARIANT (TREE_TYPE (src))) - { - gimple_seq tmp_seq; - seq = NULL; +static struct access * +create_artificial_child_access (struct access *parent, struct access *model, + HOST_WIDE_INT new_offset) +{ + struct access *access; + struct access **child; - if (!INTEGRAL_TYPE_P (TREE_TYPE (src))) - src = fold_build1 (VIEW_CONVERT_EXPR, - lang_hooks.types.type_for_size - (TREE_INT_CST_LOW - (TYPE_SIZE (TREE_TYPE (src))), - 1), src); - gcc_assert (TYPE_UNSIGNED (TREE_TYPE (src))); - - tmp = make_rename_temp (TREE_TYPE (src), "SR"); - stmt = gimple_build_assign (tmp, src); - gimple_seq_add_stmt (&seq, stmt); - - tmp_seq = sra_build_assignment (var, - fold_convert (TREE_TYPE (var), - tmp)); - gimple_seq_add_seq (&seq, tmp_seq); + gcc_assert (!model->grp_unscalarizable_region); - return seq; - } + access = (struct access *) pool_alloc (access_pool); + memset (access, 0, sizeof (struct access)); + access->base = parent->base; + access->offset = new_offset; + access->size = model->size; + duplicate_expr_for_different_base (access, model); + access->type = model->type; + access->grp_write = true; + access->grp_read = false; - src = fold_convert (TREE_TYPE (var), src); - } - else - { - src = fold_convert (TREE_TYPE (var), tmp); - } + child = &parent->first_child; + while (*child && (*child)->offset < new_offset) + child = &(*child)->next_sibling; - return sra_build_assignment (var, src); - } + access->next_sibling = *child; + *child = access; - return sra_build_bf_assignment (dst, src); + return access; } -/* Generate a set of assignment statements in *LIST_P to copy all - instantiated elements under ELT to or from the equivalent structure - rooted at EXPR. COPY_OUT controls the direction of the copy, with - true meaning to copy out of EXPR into ELT. */ -static void -generate_copy_inout (struct sra_elt *elt, bool copy_out, tree expr, - gimple_seq *seq_p) +/* Propagate all subaccesses of RACC across an assignment link to LACC. Return + true if any new subaccess was created. Additionally, if RACC is a scalar + access but LACC is not, change the type of the latter. */ + +static bool +propagate_subacesses_accross_link (struct access *lacc, struct access *racc) { - struct sra_elt *c; - gimple_seq tmp_seq; - tree t; + struct access *rchild; + HOST_WIDE_INT norm_delta = lacc->offset - racc->offset; + bool ret = false; + + if (is_sra_scalar_type (lacc->type) + || lacc->grp_unscalarizable_region + || racc->grp_unscalarizable_region) + return false; - if (!copy_out && TREE_CODE (expr) == SSA_NAME - && TREE_CODE (TREE_TYPE (expr)) == COMPLEX_TYPE) + if (!lacc->first_child && !racc->first_child + && is_sra_scalar_type (racc->type) + && (sra_mode == SRA_MODE_INTRA + || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base)))) { - tree r, i; + duplicate_expr_for_different_base (lacc, racc); + lacc->type = racc->type; + return false; + } - c = lookup_element (elt, integer_zero_node, NULL, NO_INSERT); - r = c->replacement; - c = lookup_element (elt, integer_one_node, NULL, NO_INSERT); - i = c->replacement; + gcc_assert (lacc->size <= racc->size); - t = build2 (COMPLEX_EXPR, elt->type, r, i); - tmp_seq = sra_build_bf_assignment (expr, t); - SSA_NAME_DEF_STMT (expr) = gimple_seq_last_stmt (tmp_seq); - gimple_seq_add_seq (seq_p, tmp_seq); - } - else if (elt->replacement) - { - if (copy_out) - tmp_seq = sra_build_elt_assignment (elt, expr); - else - tmp_seq = sra_build_bf_assignment (expr, REPLDUP (elt->replacement)); - gimple_seq_add_seq (seq_p, tmp_seq); - } - else + for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling) { - FOR_EACH_ACTUAL_CHILD (c, elt) + struct access *new_acc = NULL; + HOST_WIDE_INT norm_offset = rchild->offset + norm_delta; + + if (rchild->grp_unscalarizable_region) + continue; + + if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size, + &new_acc)) { - t = generate_one_element_ref (c, unshare_expr (expr)); - generate_copy_inout (c, copy_out, t, seq_p); + if (new_acc && rchild->first_child) + ret |= propagate_subacesses_accross_link (new_acc, rchild); + continue; } + + new_acc = create_artificial_child_access (lacc, rchild, norm_offset); + if (racc->first_child) + propagate_subacesses_accross_link (new_acc, rchild); + + ret = true; } + + return ret; } -/* Generate a set of assignment statements in *LIST_P to copy all instantiated - elements under SRC to their counterparts under DST. There must be a 1-1 - correspondence of instantiated elements. */ +/* Propagate all subaccesses across assignment links. */ static void -generate_element_copy (struct sra_elt *dst, struct sra_elt *src, gimple_seq *seq_p) +propagate_all_subaccesses (void) { - struct sra_elt *dc, *sc; - - FOR_EACH_ACTUAL_CHILD (dc, dst) + while (work_queue_head) { - sc = lookup_element (src, dc->element, NULL, NO_INSERT); - if (!sc && dc->in_bitfld_block == 2) - { - struct sra_elt *dcs; + struct access *racc = pop_access_from_work_queue (); + struct assign_link *link; - FOR_EACH_ACTUAL_CHILD (dcs, dc) - { - sc = lookup_element (src, dcs->element, NULL, NO_INSERT); - gcc_assert (sc); - generate_element_copy (dcs, sc, seq_p); - } + gcc_assert (racc->first_link); - continue; - } + for (link = racc->first_link; link; link = link->next) + { + struct access *lacc = link->lacc; - /* If DST and SRC are structs with the same elements, but do not have - the same TYPE_MAIN_VARIANT, then lookup of DST FIELD_DECL in SRC - will fail. Try harder by finding the corresponding FIELD_DECL - in SRC. */ - if (!sc) - { - tree f; - - gcc_assert (useless_type_conversion_p (dst->type, src->type)); - gcc_assert (TREE_CODE (dc->element) == FIELD_DECL); - for (f = TYPE_FIELDS (src->type); f ; f = TREE_CHAIN (f)) - if (simple_cst_equal (DECL_FIELD_OFFSET (f), - DECL_FIELD_OFFSET (dc->element)) > 0 - && simple_cst_equal (DECL_FIELD_BIT_OFFSET (f), - DECL_FIELD_BIT_OFFSET (dc->element)) > 0 - && simple_cst_equal (DECL_SIZE (f), - DECL_SIZE (dc->element)) > 0 - && (useless_type_conversion_p (TREE_TYPE (dc->element), - TREE_TYPE (f)) - || (POINTER_TYPE_P (TREE_TYPE (dc->element)) - && POINTER_TYPE_P (TREE_TYPE (f))))) - break; - gcc_assert (f != NULL_TREE); - sc = lookup_element (src, f, NULL, NO_INSERT); + if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base))) + continue; + lacc = lacc->group_representative; + if (propagate_subacesses_accross_link (lacc, racc) + && lacc->first_link) + add_access_to_work_queue (lacc); } - - generate_element_copy (dc, sc, seq_p); - } - - if (dst->replacement) - { - gimple_seq tmp_seq; - - gcc_assert (src->replacement); - - tmp_seq = sra_build_elt_assignment (dst, REPLDUP (src->replacement)); - gimple_seq_add_seq (seq_p, tmp_seq); } } -/* Generate a set of assignment statements in *LIST_P to zero all instantiated - elements under ELT. In addition, do not assign to elements that have been - marked VISITED but do reset the visited flag; this allows easy coordination - with generate_element_init. */ +/* Go through all accesses collected throughout the (intraprocedural) analysis + stage, exclude overlapping ones, identify representatives and build trees + out of them, making decisions about scalarization on the way. Return true + iff there are any to-be-scalarized variables after this stage. */ -static void -generate_element_zero (struct sra_elt *elt, gimple_seq *seq_p) +static bool +analyze_all_variable_accesses (void) { - struct sra_elt *c; + tree var; + referenced_var_iterator rvi; + bool res = false; - if (elt->visited) - { - elt->visited = false; - return; - } + FOR_EACH_REFERENCED_VAR (var, rvi) + if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) + { + struct access *access; - if (!elt->in_bitfld_block) - FOR_EACH_ACTUAL_CHILD (c, elt) - generate_element_zero (c, seq_p); + access = sort_and_splice_var_accesses (var); + if (access) + build_access_trees (access); + else + disqualify_candidate (var, + "No or inhibitingly overlapping accesses."); + } - if (elt->replacement) - { - tree t; - gimple_seq tmp_seq; + propagate_all_subaccesses (); - gcc_assert (elt->is_scalar); - t = fold_convert (elt->type, integer_zero_node); + FOR_EACH_REFERENCED_VAR (var, rvi) + if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) + { + struct access *access = get_first_repr_for_decl (var); - tmp_seq = sra_build_elt_assignment (elt, t); - gimple_seq_add_seq (seq_p, tmp_seq); - } + if (analyze_access_trees (access)) + { + res = true; + if (dump_file) + { + fprintf (dump_file, "\nAccess trees for "); + print_generic_expr (dump_file, var, 0); + fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); + dump_access_tree (dump_file, access); + fprintf (dump_file, "\n"); + } + } + else + disqualify_candidate (var, "No scalar replacements to be created."); + } + + return res; } -/* Generate an assignment VAR = INIT, where INIT may need gimplification. - Add the result to *LIST_P. */ +/* Return true iff a reference statement into aggregate AGG can be built for + every single to-be-replaced accesses that is a child of ACCESS, its sibling + or a child of its sibling. TOP_OFFSET is the offset from the processed + access subtree that has to be subtracted from offset of each access. */ -static void -generate_one_element_init (struct sra_elt *elt, tree init, gimple_seq *seq_p) +static bool +ref_expr_for_all_replacements_p (struct access *access, tree agg, + HOST_WIDE_INT top_offset) { - gimple_seq tmp_seq = sra_build_elt_assignment (elt, init); - gimple_seq_add_seq (seq_p, tmp_seq); + do + { + if (access->grp_to_be_replaced + && !build_ref_for_offset (NULL, TREE_TYPE (agg), + access->offset - top_offset, + access->type, false)) + return false; + + if (access->first_child + && !ref_expr_for_all_replacements_p (access->first_child, agg, + top_offset)) + return false; + + access = access->next_sibling; + } + while (access); + + return true; } -/* Generate a set of assignment statements in *LIST_P to set all instantiated - elements under ELT with the contents of the initializer INIT. In addition, - mark all assigned elements VISITED; this allows easy coordination with - generate_element_zero. Return false if we found a case we couldn't - handle. */ -static bool -generate_element_init_1 (struct sra_elt *elt, tree init, gimple_seq *seq_p) +/* Generate statements copying scalar replacements of accesses within a subtree + into or out of AGG. ACCESS is the first child of the root of the subtree to + be processed. AGG is an aggregate type expression (can be a declaration but + does not have to be, it can for example also be an indirect_ref). + TOP_OFFSET is the offset of the processed subtree which has to be subtracted + from offsets of individual accesses to get corresponding offsets for AGG. + If CHUNK_SIZE is non-null, copy only replacements in the interval + <start_offset, start_offset + chunk_size>, otherwise copy all. GSI is a + statement iterator used to place the new statements. WRITE should be true + when the statements should write from AGG to the replacement and false if + vice versa. if INSERT_AFTER is true, new statements will be added after the + current statement in GSI, they will be added before the statement + otherwise. */ + +static void +generate_subtree_copies (struct access *access, tree agg, + HOST_WIDE_INT top_offset, + HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size, + gimple_stmt_iterator *gsi, bool write, + bool insert_after) { - bool result = true; - enum tree_code init_code; - struct sra_elt *sub; - tree t; - unsigned HOST_WIDE_INT idx; - tree value, purpose; + do + { + tree expr = unshare_expr (agg); - /* We can be passed DECL_INITIAL of a static variable. It might have a - conversion, which we strip off here. */ - STRIP_USELESS_TYPE_CONVERSION (init); - init_code = TREE_CODE (init); + if (chunk_size && access->offset >= start_offset + chunk_size) + return; - if (elt->is_scalar) - { - if (elt->replacement) - { - generate_one_element_init (elt, init, seq_p); - elt->visited = true; - } - return result; - } + if (access->grp_to_be_replaced + && (chunk_size == 0 + || access->offset + access->size > start_offset)) + { + bool repl_found; + gimple stmt; + + repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg), + access->offset - top_offset, + access->type, false); + gcc_assert (repl_found); - switch (init_code) - { - case COMPLEX_CST: - case COMPLEX_EXPR: - FOR_EACH_ACTUAL_CHILD (sub, elt) - { - if (sub->element == integer_zero_node) - t = (init_code == COMPLEX_EXPR - ? TREE_OPERAND (init, 0) : TREE_REALPART (init)); + if (write) + stmt = gimple_build_assign (get_access_replacement (access), expr); else - t = (init_code == COMPLEX_EXPR - ? TREE_OPERAND (init, 1) : TREE_IMAGPART (init)); - result &= generate_element_init_1 (sub, t, seq_p); - } - break; - - case CONSTRUCTOR: - FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (init), idx, purpose, value) - { - /* Array constructors are routinely created with NULL indices. */ - if (purpose == NULL_TREE) { - result = false; - break; + tree repl = get_access_replacement (access); + TREE_NO_WARNING (repl) = 1; + stmt = gimple_build_assign (expr, repl); } - if (TREE_CODE (purpose) == RANGE_EXPR) - { - tree lower = TREE_OPERAND (purpose, 0); - tree upper = TREE_OPERAND (purpose, 1); - while (1) - { - sub = lookup_element (elt, lower, NULL, NO_INSERT); - if (sub != NULL) - result &= generate_element_init_1 (sub, value, seq_p); - if (tree_int_cst_equal (lower, upper)) - break; - lower = int_const_binop (PLUS_EXPR, lower, - integer_one_node, true); - } - } + if (insert_after) + gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else - { - sub = lookup_element (elt, purpose, NULL, NO_INSERT); - if (sub != NULL) - result &= generate_element_init_1 (sub, value, seq_p); - } + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + update_stmt (stmt); } - break; - default: - elt->visited = true; - result = false; - } + if (access->first_child) + generate_subtree_copies (access->first_child, agg, top_offset, + start_offset, chunk_size, gsi, + write, insert_after); - return result; + access = access->next_sibling; + } + while (access); } -/* A wrapper function for generate_element_init_1 that handles cleanup after - gimplification. */ +/* Assign zero to all scalar replacements in an access subtree. ACCESS is the + the root of the subtree to be processed. GSI is the statement iterator used + for inserting statements which are added after the current statement if + INSERT_AFTER is true or before it otherwise. */ -static bool -generate_element_init (struct sra_elt *elt, tree init, gimple_seq *seq_p) -{ - bool ret; - struct gimplify_ctx gctx; +static void +init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi, + bool insert_after) - push_gimplify_context (&gctx); - ret = generate_element_init_1 (elt, init, seq_p); - pop_gimplify_context (NULL); +{ + struct access *child; - /* The replacement can expose previously unreferenced variables. */ - if (ret && *seq_p) + if (access->grp_to_be_replaced) { - gimple_stmt_iterator i; + gimple stmt; - for (i = gsi_start (*seq_p); !gsi_end_p (i); gsi_next (&i)) - find_new_referenced_vars (gsi_stmt (i)); + stmt = gimple_build_assign (get_access_replacement (access), + fold_convert (access->type, + integer_zero_node)); + if (insert_after) + gsi_insert_after (gsi, stmt, GSI_NEW_STMT); + else + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + update_stmt (stmt); } - return ret; + for (child = access->first_child; child; child = child->next_sibling) + init_subtree_with_zero (child, gsi, insert_after); } -/* Helper function to insert LIST before GSI, and set up line number info. */ +/* Search for an access representative for the given expression EXPR and + return it or NULL if it cannot be found. */ -static void -sra_insert_before (gimple_stmt_iterator *gsi, gimple_seq seq) +static struct access * +get_access_for_expr (tree expr) { - gimple stmt = gsi_stmt (*gsi); - - if (gimple_has_location (stmt)) - annotate_all_with_location (seq, gimple_location (stmt)); - gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT); -} + HOST_WIDE_INT offset, size, max_size; + tree base; -/* Similarly, but insert after GSI. Handles insertion onto edges as well. */ + if (TREE_CODE (expr) == NOP_EXPR + || TREE_CODE (expr) == VIEW_CONVERT_EXPR) + expr = TREE_OPERAND (expr, 0); -static void -sra_insert_after (gimple_stmt_iterator *gsi, gimple_seq seq) -{ - gimple stmt = gsi_stmt (*gsi); + if (handled_component_p (expr)) + { + base = get_ref_base_and_extent (expr, &offset, &size, &max_size); + size = max_size; + if (size == -1 || !base || !DECL_P (base)) + return NULL; + } + else if (DECL_P (expr)) + { + tree tree_size; - if (gimple_has_location (stmt)) - annotate_all_with_location (seq, gimple_location (stmt)); + base = expr; + tree_size = TYPE_SIZE (TREE_TYPE (base)); + if (tree_size && host_integerp (tree_size, 1)) + size = max_size = tree_low_cst (tree_size, 1); + else + return NULL; - if (stmt_ends_bb_p (stmt)) - insert_edge_copies_seq (seq, gsi_bb (*gsi)); + offset = 0; + } else - gsi_insert_seq_after (gsi, seq, GSI_SAME_STMT); + return NULL; + + if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base))) + return NULL; + + return get_var_base_offset_size_access (base, offset, size); } -/* Similarly, but replace the statement at GSI. */ +/* Substitute into *EXPR an expression of type TYPE with the value of the + replacement of ACCESS. This is done either by producing a special V_C_E + assignment statement converting the replacement to a new temporary of the + requested type if TYPE is not TREE_ADDRESSABLE or by going through the base + aggregate if it is. */ static void -sra_replace (gimple_stmt_iterator *gsi, gimple_seq seq) +fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, + gimple_stmt_iterator *gsi, bool write) { - sra_insert_before (gsi, seq); - unlink_stmt_vdef (gsi_stmt (*gsi)); - gsi_remove (gsi, false); - if (gsi_end_p (*gsi)) - *gsi = gsi_last (gsi_seq (*gsi)); - else - gsi_prev (gsi); -} + tree repl = get_access_replacement (access); + if (!TREE_ADDRESSABLE (type)) + { + tree tmp = create_tmp_var (type, "SRvce"); -/* Data structure that bitfield_overlaps_p fills in with information - about the element passed in and how much of it overlaps with the - bit-range passed it to. */ + add_referenced_var (tmp); + if (is_gimple_reg_type (type)) + tmp = make_ssa_name (tmp, NULL); -struct bitfield_overlap_info -{ - /* The bit-length of an element. */ - tree field_len; + if (write) + { + gimple stmt; + tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); - /* The bit-position of the element in its parent. */ - tree field_pos; + *expr = tmp; + if (is_gimple_reg_type (type)) + SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); + stmt = gimple_build_assign (repl, conv); + gsi_insert_after (gsi, stmt, GSI_NEW_STMT); + update_stmt (stmt); + } + else + { + gimple stmt; + tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); - /* The number of bits of the element that overlap with the incoming - bit range. */ - tree overlap_len; + stmt = gimple_build_assign (tmp, conv); + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + if (is_gimple_reg_type (type)) + SSA_NAME_DEF_STMT (tmp) = stmt; + *expr = tmp; + update_stmt (stmt); + } + } + else + { + if (write) + { + gimple stmt; - /* The first bit of the element that overlaps with the incoming bit - range. */ - tree overlap_pos; -}; + stmt = gimple_build_assign (repl, unshare_expr (access->expr)); + gsi_insert_after (gsi, stmt, GSI_NEW_STMT); + update_stmt (stmt); + } + else + { + gimple stmt; + + stmt = gimple_build_assign (unshare_expr (access->expr), repl); + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + update_stmt (stmt); + } + } +} -/* Return true if a BIT_FIELD_REF<(FLD->parent), BLEN, BPOS> - expression (referenced as BF below) accesses any of the bits in FLD, - false if it doesn't. If DATA is non-null, its field_len and - field_pos are filled in such that BIT_FIELD_REF<(FLD->parent), - field_len, field_pos> (referenced as BFLD below) represents the - entire field FLD->element, and BIT_FIELD_REF<BFLD, overlap_len, - overlap_pos> represents the portion of the entire field that - overlaps with BF. */ + +/* Callback for scan_function. Replace the expression EXPR with a scalar + replacement if there is one and generate other statements to do type + conversion or subtree copying if necessary. GSI is used to place newly + created statements, WRITE is true if the expression is being written to (it + is on a LHS of a statement or output in an assembly statement). */ static bool -bitfield_overlaps_p (tree blen, tree bpos, struct sra_elt *fld, - struct bitfield_overlap_info *data) +sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write, + void *data ATTRIBUTE_UNUSED) { - tree flen, fpos; - bool ret; + struct access *access; + tree type, bfr; - if (TREE_CODE (fld->element) == FIELD_DECL) + if (TREE_CODE (*expr) == BIT_FIELD_REF) { - flen = fold_convert (bitsizetype, DECL_SIZE (fld->element)); - fpos = fold_convert (bitsizetype, DECL_FIELD_OFFSET (fld->element)); - fpos = size_binop (MULT_EXPR, fpos, bitsize_int (BITS_PER_UNIT)); - fpos = size_binop (PLUS_EXPR, fpos, DECL_FIELD_BIT_OFFSET (fld->element)); - } - else if (TREE_CODE (fld->element) == BIT_FIELD_REF) - { - flen = fold_convert (bitsizetype, TREE_OPERAND (fld->element, 1)); - fpos = fold_convert (bitsizetype, TREE_OPERAND (fld->element, 2)); - } - else if (TREE_CODE (fld->element) == INTEGER_CST) - { - tree domain_type = TYPE_DOMAIN (TREE_TYPE (fld->parent->element)); - flen = fold_convert (bitsizetype, TYPE_SIZE (fld->type)); - fpos = fold_convert (bitsizetype, fld->element); - if (domain_type && TYPE_MIN_VALUE (domain_type)) - fpos = size_binop (MINUS_EXPR, fpos, - fold_convert (bitsizetype, - TYPE_MIN_VALUE (domain_type))); - fpos = size_binop (MULT_EXPR, flen, fpos); + bfr = *expr; + expr = &TREE_OPERAND (*expr, 0); } else - gcc_unreachable (); - - gcc_assert (host_integerp (blen, 1) - && host_integerp (bpos, 1) - && host_integerp (flen, 1) - && host_integerp (fpos, 1)); + bfr = NULL_TREE; - ret = ((!tree_int_cst_lt (fpos, bpos) - && tree_int_cst_lt (size_binop (MINUS_EXPR, fpos, bpos), - blen)) - || (!tree_int_cst_lt (bpos, fpos) - && tree_int_cst_lt (size_binop (MINUS_EXPR, bpos, fpos), - flen))); + if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR) + expr = &TREE_OPERAND (*expr, 0); + type = TREE_TYPE (*expr); - if (!ret) - return ret; + access = get_access_for_expr (*expr); + if (!access) + return false; - if (data) + if (access->grp_to_be_replaced) { - tree bend, fend; - - data->field_len = flen; - data->field_pos = fpos; - - fend = size_binop (PLUS_EXPR, fpos, flen); - bend = size_binop (PLUS_EXPR, bpos, blen); - - if (tree_int_cst_lt (bend, fend)) - data->overlap_len = size_binop (MINUS_EXPR, bend, fpos); + if (!useless_type_conversion_p (type, access->type)) + fix_incompatible_types_for_expr (expr, type, access, gsi, write); else - data->overlap_len = NULL; + *expr = get_access_replacement (access); + } - if (tree_int_cst_lt (fpos, bpos)) + if (access->first_child) + { + HOST_WIDE_INT start_offset, chunk_size; + if (bfr + && host_integerp (TREE_OPERAND (bfr, 1), 1) + && host_integerp (TREE_OPERAND (bfr, 2), 1)) { - data->overlap_pos = size_binop (MINUS_EXPR, bpos, fpos); - data->overlap_len = size_binop (MINUS_EXPR, - data->overlap_len - ? data->overlap_len - : data->field_len, - data->overlap_pos); + start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1); + chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1); } else - data->overlap_pos = NULL; - } + start_offset = chunk_size = 0; - return ret; + generate_subtree_copies (access->first_child, access->base, 0, + start_offset, chunk_size, gsi, write, write); + } + return true; } -/* Add to LISTP a sequence of statements that copies BLEN bits between - VAR and the scalarized elements of ELT, starting a bit VPOS of VAR - and at bit BPOS of ELT. The direction of the copy is given by - TO_VAR. */ +/* Store all replacements in the access tree rooted in TOP_RACC either to their + base aggregate if there are unscalarized data or directly to LHS + otherwise. */ static void -sra_explode_bitfield_assignment (tree var, tree vpos, bool to_var, - gimple_seq *seq_p, tree blen, tree bpos, - struct sra_elt *elt) +handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs, + gimple_stmt_iterator *gsi) { - struct sra_elt *fld; - struct bitfield_overlap_info flp; - - FOR_EACH_ACTUAL_CHILD (fld, elt) - { - tree flen, fpos; + if (top_racc->grp_unscalarized_data) + generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0, + gsi, false, false); + else + generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset, + 0, 0, gsi, false, false); +} - if (!bitfield_overlaps_p (blen, bpos, fld, &flp)) - continue; - flen = flp.overlap_len ? flp.overlap_len : flp.field_len; - fpos = flp.overlap_pos ? flp.overlap_pos : bitsize_int (0); +/* Try to generate statements to load all sub-replacements in an access + (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC + (sub)tree. If that is not possible, refresh the TOP_RACC base aggregate and + load the accesses from it. LEFT_OFFSET is the offset of the left whole + subtree being copied, RIGHT_OFFSET is the same thing for the right subtree. + GSI is stmt iterator used for statement insertions. *REFRESHED is true iff + the rhs top aggregate has already been refreshed by contents of its scalar + reductions and is set to true if this function has to do it. */ - if (fld->replacement) +static void +load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc, + HOST_WIDE_INT left_offset, + HOST_WIDE_INT right_offset, + gimple_stmt_iterator *old_gsi, + gimple_stmt_iterator *new_gsi, + bool *refreshed, tree lhs) +{ + do + { + if (lacc->grp_to_be_replaced) { - tree infld, invar, type; - gimple_seq st; - - infld = fld->replacement; - - type = unsigned_type_for (TREE_TYPE (infld)); - if (TYPE_PRECISION (type) != TREE_INT_CST_LOW (flen)) - type = build_nonstandard_integer_type (TREE_INT_CST_LOW (flen), 1); + struct access *racc; + HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset; - if (TREE_CODE (infld) == BIT_FIELD_REF) + racc = find_access_in_subtree (top_racc, offset, lacc->size); + if (racc && racc->grp_to_be_replaced) { - fpos = size_binop (PLUS_EXPR, fpos, TREE_OPERAND (infld, 2)); - infld = TREE_OPERAND (infld, 0); - } - else if (BYTES_BIG_ENDIAN && DECL_P (fld->element) - && !tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (infld)), - DECL_SIZE (fld->element))) - { - fpos = size_binop (PLUS_EXPR, fpos, - TYPE_SIZE (TREE_TYPE (infld))); - fpos = size_binop (MINUS_EXPR, fpos, - DECL_SIZE (fld->element)); - } - - infld = fold_build3 (BIT_FIELD_REF, type, infld, flen, fpos); + gimple stmt; - invar = size_binop (MINUS_EXPR, flp.field_pos, bpos); - if (flp.overlap_pos) - invar = size_binop (PLUS_EXPR, invar, flp.overlap_pos); - invar = size_binop (PLUS_EXPR, invar, vpos); - - invar = fold_build3 (BIT_FIELD_REF, type, var, flen, invar); + if (useless_type_conversion_p (lacc->type, racc->type)) + stmt = gimple_build_assign (get_access_replacement (lacc), + get_access_replacement (racc)); + else + { + tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type, + get_access_replacement (racc)); + stmt = gimple_build_assign (get_access_replacement (lacc), + rhs); + } - if (to_var) - st = sra_build_bf_assignment (invar, infld); + gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); + update_stmt (stmt); + } else - st = sra_build_bf_assignment (infld, invar); + { + tree expr = unshare_expr (top_racc->base); + bool repl_found; + gimple stmt; + + /* No suitable access on the right hand side, need to load from + the aggregate. See if we have to update it first... */ + if (!*refreshed) + { + gcc_assert (top_racc->first_child); + handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); + *refreshed = true; + } - gimple_seq_add_seq (seq_p, st); + repl_found = build_ref_for_offset (&expr, + TREE_TYPE (top_racc->base), + lacc->offset - left_offset, + lacc->type, false); + gcc_assert (repl_found); + stmt = gimple_build_assign (get_access_replacement (lacc), + expr); + gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); + update_stmt (stmt); + } } - else + else if (lacc->grp_read && !lacc->grp_covered && !*refreshed) { - tree sub = size_binop (MINUS_EXPR, flp.field_pos, bpos); - sub = size_binop (PLUS_EXPR, vpos, sub); - if (flp.overlap_pos) - sub = size_binop (PLUS_EXPR, sub, flp.overlap_pos); - - sra_explode_bitfield_assignment (var, sub, to_var, seq_p, - flen, fpos, fld); + handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); + *refreshed = true; } + + if (lacc->first_child) + load_assign_lhs_subreplacements (lacc->first_child, top_racc, + left_offset, right_offset, + old_gsi, new_gsi, refreshed, lhs); + lacc = lacc->next_sibling; } + while (lacc); } -/* Add to LISTBEFOREP statements that copy scalarized members of ELT - that overlap with BIT_FIELD_REF<(ELT->element), BLEN, BPOS> back - into the full variable, and to LISTAFTERP, if non-NULL, statements - that copy the (presumably modified) overlapping portions of the - full variable back to the scalarized variables. */ +/* Return true iff ACC is non-NULL and has subaccesses. */ -static void -sra_sync_for_bitfield_assignment (gimple_seq *seq_before_p, - gimple_seq *seq_after_p, - tree blen, tree bpos, - struct sra_elt *elt) +static inline bool +access_has_children_p (struct access *acc) { - struct sra_elt *fld; - struct bitfield_overlap_info flp; - - FOR_EACH_ACTUAL_CHILD (fld, elt) - if (bitfield_overlaps_p (blen, bpos, fld, &flp)) - { - if (fld->replacement || (!flp.overlap_len && !flp.overlap_pos)) - { - generate_copy_inout (fld, false, generate_element_ref (fld), - seq_before_p); - mark_no_warning (fld); - if (seq_after_p) - generate_copy_inout (fld, true, generate_element_ref (fld), - seq_after_p); - } - else - { - tree flen = flp.overlap_len ? flp.overlap_len : flp.field_len; - tree fpos = flp.overlap_pos ? flp.overlap_pos : bitsize_int (0); - - sra_sync_for_bitfield_assignment (seq_before_p, seq_after_p, - flen, fpos, fld); - } - } + return acc && acc->first_child; } -/* Scalarize a USE. To recap, this is either a simple reference to ELT, - if elt is scalar, or some occurrence of ELT that requires a complete - aggregate. IS_OUTPUT is true if ELT is being modified. */ - -static void -scalarize_use (struct sra_elt *elt, tree *expr_p, gimple_stmt_iterator *gsi, - bool is_output, bool use_all) -{ - gimple stmt = gsi_stmt (*gsi); - tree bfexpr; - - if (elt->replacement) - { - tree replacement = elt->replacement; - - /* If we have a replacement, then updating the reference is as - simple as modifying the existing statement in place. */ - if (is_output - && TREE_CODE (elt->replacement) == BIT_FIELD_REF - && is_gimple_reg (TREE_OPERAND (elt->replacement, 0)) - && is_gimple_assign (stmt) - && gimple_assign_lhs_ptr (stmt) == expr_p) - { - gimple_seq newseq; - /* RHS must be a single operand. */ - gcc_assert (gimple_assign_single_p (stmt)); - newseq = sra_build_elt_assignment (elt, gimple_assign_rhs1 (stmt)); - sra_replace (gsi, newseq); - return; - } - else if (!is_output - && TREE_CODE (elt->replacement) == BIT_FIELD_REF - && is_gimple_assign (stmt) - && gimple_assign_rhs1_ptr (stmt) == expr_p) - { - tree tmp = make_rename_temp - (TREE_TYPE (gimple_assign_lhs (stmt)), "SR"); - gimple_seq newseq = sra_build_assignment (tmp, REPLDUP (elt->replacement)); - - sra_insert_before (gsi, newseq); - replacement = tmp; - } - if (is_output) - update_stmt_if_modified (stmt); - *expr_p = REPLDUP (replacement); - update_stmt (stmt); - } - else if (use_all && is_output - && is_gimple_assign (stmt) - && TREE_CODE (bfexpr - = gimple_assign_lhs (stmt)) == BIT_FIELD_REF - && &TREE_OPERAND (bfexpr, 0) == expr_p - && INTEGRAL_TYPE_P (TREE_TYPE (bfexpr)) - && TREE_CODE (TREE_TYPE (*expr_p)) == RECORD_TYPE) - { - gimple_seq seq_before = NULL; - gimple_seq seq_after = NULL; - tree blen = fold_convert (bitsizetype, TREE_OPERAND (bfexpr, 1)); - tree bpos = fold_convert (bitsizetype, TREE_OPERAND (bfexpr, 2)); - bool update = false; - - if (!elt->use_block_copy) - { - tree type = TREE_TYPE (bfexpr); - tree var = make_rename_temp (type, "SR"), tmp, vpos; - gimple st; - - gimple_assign_set_lhs (stmt, var); - update = true; +/* Modify assignments with a CONSTRUCTOR on their RHS. STMT contains a pointer + to the assignment and GSI is the statement iterator pointing at it. Returns + the same values as sra_modify_assign. */ - if (!TYPE_UNSIGNED (type)) - { - type = unsigned_type_for (type); - tmp = make_rename_temp (type, "SR"); - st = gimple_build_assign (tmp, fold_convert (type, var)); - gimple_seq_add_stmt (&seq_after, st); - var = tmp; - } +static enum scan_assign_result +sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi) +{ + tree lhs = gimple_assign_lhs (*stmt); + struct access *acc; - /* If VAR is wider than BLEN bits, it is padded at the - most-significant end. We want to set VPOS such that - <BIT_FIELD_REF VAR BLEN VPOS> would refer to the - least-significant BLEN bits of VAR. */ - if (BYTES_BIG_ENDIAN) - vpos = size_binop (MINUS_EXPR, TYPE_SIZE (type), blen); - else - vpos = bitsize_int (0); - sra_explode_bitfield_assignment - (var, vpos, false, &seq_after, blen, bpos, elt); - } - else - sra_sync_for_bitfield_assignment - (&seq_before, &seq_after, blen, bpos, elt); + gcc_assert (TREE_CODE (lhs) != REALPART_EXPR + && TREE_CODE (lhs) != IMAGPART_EXPR); + acc = get_access_for_expr (lhs); + if (!acc) + return SRA_SA_NONE; - if (seq_before) - { - mark_all_v_defs_seq (seq_before); - sra_insert_before (gsi, seq_before); - } - if (seq_after) - { - mark_all_v_defs_seq (seq_after); - sra_insert_after (gsi, seq_after); - } + if (VEC_length (constructor_elt, + CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0) + { + /* I have never seen this code path trigger but if it can happen the + following should handle it gracefully. */ + if (access_has_children_p (acc)) + generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi, + true, true); + return SRA_SA_PROCESSED; + } - if (update) - update_stmt (stmt); + if (acc->grp_covered) + { + init_subtree_with_zero (acc, gsi, false); + unlink_stmt_vdef (*stmt); + gsi_remove (gsi, true); + return SRA_SA_REMOVED; } - else if (use_all && !is_output - && is_gimple_assign (stmt) - && TREE_CODE (bfexpr - = gimple_assign_rhs1 (stmt)) == BIT_FIELD_REF - && &TREE_OPERAND (gimple_assign_rhs1 (stmt), 0) == expr_p - && INTEGRAL_TYPE_P (TREE_TYPE (bfexpr)) - && TREE_CODE (TREE_TYPE (*expr_p)) == RECORD_TYPE) + else { - gimple_seq seq = NULL; - tree blen = fold_convert (bitsizetype, TREE_OPERAND (bfexpr, 1)); - tree bpos = fold_convert (bitsizetype, TREE_OPERAND (bfexpr, 2)); - bool update = false; + init_subtree_with_zero (acc, gsi, true); + return SRA_SA_PROCESSED; + } +} - if (!elt->use_block_copy) - { - tree type = TREE_TYPE (bfexpr); - tree var = make_rename_temp (type, "SR"), tmp, vpos; - gimple st = NULL; - gimple_assign_set_rhs1 (stmt, var); - update = true; +/* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with + to-be-scalarized expressions with them. STMT is the statement and GSI is + the iterator used to place new helper statements. Returns the same values + as sra_modify_assign. */ - if (!TYPE_UNSIGNED (type)) - { - type = unsigned_type_for (type); - tmp = make_rename_temp (type, "SR"); - st = gimple_build_assign (var, - fold_convert (TREE_TYPE (var), tmp)); - var = tmp; - } +static enum scan_assign_result +sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi) +{ + tree lhs, complex, ptype, rp, ip; + struct access *access; + gimple new_stmt, aux_stmt; - gimple_seq_add_stmt (&seq, - gimple_build_assign - (var, build_int_cst_wide (type, 0, 0))); - - /* If VAR is wider than BLEN bits, it is padded at the - most-significant end. We want to set VPOS such that - <BIT_FIELD_REF VAR BLEN VPOS> would refer to the - least-significant BLEN bits of VAR. */ - if (BYTES_BIG_ENDIAN) - vpos = size_binop (MINUS_EXPR, TYPE_SIZE (type), blen); - else - vpos = bitsize_int (0); - sra_explode_bitfield_assignment - (var, vpos, true, &seq, blen, bpos, elt); + lhs = gimple_assign_lhs (stmt); + complex = TREE_OPERAND (lhs, 0); - if (st) - gimple_seq_add_stmt (&seq, st); - } - else - sra_sync_for_bitfield_assignment - (&seq, NULL, blen, bpos, elt); + access = get_access_for_expr (complex); - if (seq) - { - mark_all_v_defs_seq (seq); - sra_insert_before (gsi, seq); - } + if (!access || !access->grp_to_be_replaced) + return SRA_SA_NONE; + + ptype = TREE_TYPE (TREE_TYPE (complex)); + rp = create_tmp_var (ptype, "SRr"); + add_referenced_var (rp); + rp = make_ssa_name (rp, NULL); - if (update) - update_stmt (stmt); + ip = create_tmp_var (ptype, "SRp"); + add_referenced_var (ip); + ip = make_ssa_name (ip, NULL); + + if (TREE_CODE (lhs) == IMAGPART_EXPR) + { + aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype, + get_access_replacement (access))); + SSA_NAME_DEF_STMT (rp) = aux_stmt; + gimple_assign_set_lhs (stmt, ip); + SSA_NAME_DEF_STMT (ip) = stmt; } else { - gimple_seq seq = NULL; + aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype, + get_access_replacement (access))); + SSA_NAME_DEF_STMT (ip) = aux_stmt; + gimple_assign_set_lhs (stmt, rp); + SSA_NAME_DEF_STMT (rp) = stmt; + } - /* Otherwise we need some copies. If ELT is being read, then we - want to store all (modified) sub-elements back into the - structure before the reference takes place. If ELT is being - written, then we want to load the changed values back into - our shadow variables. */ - /* ??? We don't check modified for reads, we just always write all of - the values. We should be able to record the SSA number of the VOP - for which the values were last read. If that number matches the - SSA number of the VOP in the current statement, then we needn't - emit an assignment. This would also eliminate double writes when - a structure is passed as more than one argument to a function call. - This optimization would be most effective if sra_walk_function - processed the blocks in dominator order. */ + gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT); + update_stmt (aux_stmt); + new_stmt = gimple_build_assign (get_access_replacement (access), + fold_build2 (COMPLEX_EXPR, access->type, + rp, ip)); + gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); + update_stmt (new_stmt); + return SRA_SA_PROCESSED; +} - generate_copy_inout (elt, is_output, generate_element_ref (elt), &seq); - if (seq == NULL) - return; - mark_all_v_defs_seq (seq); - if (is_output) - sra_insert_after (gsi, seq); - else - { - sra_insert_before (gsi, seq); - if (use_all) - mark_no_warning (elt); - } +/* Return true iff T has a VIEW_CONVERT_EXPR among its handled components. */ + +static bool +contains_view_convert_expr_p (tree t) +{ + while (1) + { + if (TREE_CODE (t) == VIEW_CONVERT_EXPR) + return true; + if (!handled_component_p (t)) + return false; + t = TREE_OPERAND (t, 0); } } -/* Scalarize a COPY. To recap, this is an assignment statement between - two scalarizable references, LHS_ELT and RHS_ELT. */ +/* Change STMT to assign compatible types by means of adding component or array + references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as + variable with the same names in sra_modify_assign. This is done in a + such a complicated way in order to make + testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some + cases. */ static void -scalarize_copy (struct sra_elt *lhs_elt, struct sra_elt *rhs_elt, - gimple_stmt_iterator *gsi) +fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt, + struct access *lacc, struct access *racc, + tree lhs, tree *rhs, tree ltype, tree rtype) { - gimple_seq seq; - gimple stmt; - - if (lhs_elt->replacement && rhs_elt->replacement) + if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) + && !access_has_children_p (lacc)) { - /* If we have two scalar operands, modify the existing statement. */ - stmt = gsi_stmt (*gsi); - - /* See the commentary in sra_walk_function concerning - RETURN_EXPR, and why we should never see one here. */ - gcc_assert (is_gimple_assign (stmt)); - gcc_assert (gimple_assign_copy_p (stmt)); - - - gimple_assign_set_lhs (stmt, lhs_elt->replacement); - gimple_assign_set_rhs1 (stmt, REPLDUP (rhs_elt->replacement)); - update_stmt (stmt); + tree expr = unshare_expr (lhs); + bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, + false); + if (found) + { + gimple_assign_set_lhs (*stmt, expr); + return; + } } - else if (lhs_elt->use_block_copy || rhs_elt->use_block_copy) + + if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) + && !access_has_children_p (racc)) { - /* If either side requires a block copy, then sync the RHS back - to the original structure, leave the original assignment - statement (which will perform the block copy), then load the - LHS values out of its now-updated original structure. */ - /* ??? Could perform a modified pair-wise element copy. That - would at least allow those elements that are instantiated in - both structures to be optimized well. */ - - seq = NULL; - generate_copy_inout (rhs_elt, false, - generate_element_ref (rhs_elt), &seq); - if (seq) - { - mark_all_v_defs_seq (seq); - sra_insert_before (gsi, seq); - } - - seq = NULL; - generate_copy_inout (lhs_elt, true, - generate_element_ref (lhs_elt), &seq); - if (seq) + tree expr = unshare_expr (*rhs); + bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, + false); + if (found) { - mark_all_v_defs_seq (seq); - sra_insert_after (gsi, seq); + gimple_assign_set_rhs1 (*stmt, expr); + return; } } - else - { - /* Otherwise both sides must be fully instantiated. In which - case perform pair-wise element assignments and replace the - original block copy statement. */ - stmt = gsi_stmt (*gsi); - update_stmt_if_modified (stmt); + *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); + gimple_assign_set_rhs_from_tree (gsi, *rhs); + *stmt = gsi_stmt (*gsi); +} + +/* Callback of scan_function to process assign statements. It examines both + sides of the statement, replaces them with a scalare replacement if there is + one and generating copying of replacements if scalarized aggregates have been + used in the assignment. STMT is a pointer to the assign statement, GSI is + used to hold generated statements for type conversions and subtree + copying. */ + +static enum scan_assign_result +sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, + void *data ATTRIBUTE_UNUSED) +{ + struct access *lacc, *racc; + tree ltype, rtype; + tree lhs, rhs; + bool modify_this_stmt; - seq = NULL; - generate_element_copy (lhs_elt, rhs_elt, &seq); - gcc_assert (seq); - mark_all_v_defs_seq (seq); - sra_replace (gsi, seq); - } -} + if (gimple_assign_rhs2 (*stmt)) + return SRA_SA_NONE; + lhs = gimple_assign_lhs (*stmt); + rhs = gimple_assign_rhs1 (*stmt); -/* Scalarize an INIT. To recap, this is an assignment to a scalarizable - reference from some form of constructor: CONSTRUCTOR, COMPLEX_CST or - COMPLEX_EXPR. If RHS is NULL, it should be treated as an empty - CONSTRUCTOR. */ + if (TREE_CODE (rhs) == CONSTRUCTOR) + return sra_modify_constructor_assign (stmt, gsi); -static void -scalarize_init (struct sra_elt *lhs_elt, tree rhs, gimple_stmt_iterator *gsi) -{ - bool result = true; - gimple_seq seq = NULL, init_seq = NULL; + if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) + return sra_modify_partially_complex_lhs (*stmt, gsi); - /* Generate initialization statements for all members extant in the RHS. */ - if (rhs) + if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR + || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) { - /* Unshare the expression just in case this is from a decl's initial. */ - rhs = unshare_expr (rhs); - result = generate_element_init (lhs_elt, rhs, &init_seq); + modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), + gsi, false, data); + modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), + gsi, true, data); + return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } - if (!result) + lacc = get_access_for_expr (lhs); + racc = get_access_for_expr (rhs); + if (!lacc && !racc) + return SRA_SA_NONE; + + modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) + || (racc && racc->grp_to_be_replaced)); + + if (lacc && lacc->grp_to_be_replaced) { - /* If we failed to convert the entire initializer, then we must - leave the structure assignment in place and must load values - from the structure into the slots for which we did not find - constants. The easiest way to do this is to generate a complete - copy-out, and then follow that with the constant assignments - that we were able to build. DCE will clean things up. */ - gimple_seq seq0 = NULL; - generate_copy_inout (lhs_elt, true, generate_element_ref (lhs_elt), - &seq0); - gimple_seq_add_seq (&seq0, seq); - seq = seq0; + lhs = get_access_replacement (lacc); + gimple_assign_set_lhs (*stmt, lhs); + ltype = lacc->type; } else - { - /* CONSTRUCTOR is defined such that any member not mentioned is assigned - a zero value. Initialize the rest of the instantiated elements. */ - generate_element_zero (lhs_elt, &seq); - gimple_seq_add_seq (&seq, init_seq); - } + ltype = TREE_TYPE (lhs); - if (lhs_elt->use_block_copy || !result) + if (racc && racc->grp_to_be_replaced) { - /* Since LHS is not fully instantiated, we must leave the structure - assignment in place. Treating this case differently from a USE - exposes constants to later optimizations. */ - if (seq) - { - mark_all_v_defs_seq (seq); - sra_insert_after (gsi, seq); - } + rhs = get_access_replacement (racc); + gimple_assign_set_rhs1 (*stmt, rhs); + rtype = racc->type; } else - { - /* The LHS is fully instantiated. The list of initializations - replaces the original structure assignment. */ - gcc_assert (seq); - update_stmt_if_modified (gsi_stmt (*gsi)); - mark_all_v_defs_seq (seq); - sra_replace (gsi, seq); - } -} + rtype = TREE_TYPE (rhs); -/* A subroutine of scalarize_ldst called via walk_tree. Set TREE_NO_TRAP - on all INDIRECT_REFs. */ - -static tree -mark_notrap (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) -{ - tree t = *tp; - - if (TREE_CODE (t) == INDIRECT_REF) + /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate + the statement makes the position of this pop_stmt_changes() a bit awkward + but hopefully make some sense. */ + if (modify_this_stmt) { - TREE_THIS_NOTRAP (t) = 1; - *walk_subtrees = 0; + if (!useless_type_conversion_p (ltype, rtype)) + fix_modified_assign_compatibility (gsi, stmt, lacc, racc, + lhs, &rhs, ltype, rtype); } - else if (IS_TYPE_OR_DECL_P (t)) - *walk_subtrees = 0; - - return NULL; -} - -/* Scalarize a LDST. To recap, this is an assignment between one scalarizable - reference ELT and one non-scalarizable reference OTHER. IS_OUTPUT is true - if ELT is on the left-hand side. */ -static void -scalarize_ldst (struct sra_elt *elt, tree other, - gimple_stmt_iterator *gsi, bool is_output) -{ - /* Shouldn't have gotten called for a scalar. */ - gcc_assert (!elt->replacement); - - if (elt->use_block_copy) + if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) + || (access_has_children_p (racc) + && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) + || (access_has_children_p (lacc) + && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) { - /* Since ELT is not fully instantiated, we have to leave the - block copy in place. Treat this as a USE. */ - scalarize_use (elt, NULL, gsi, is_output, false); + if (access_has_children_p (racc)) + generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0, + gsi, false, false); + if (access_has_children_p (lacc)) + generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0, + gsi, true, true); } else { - /* The interesting case is when ELT is fully instantiated. In this - case we can have each element stored/loaded directly to/from the - corresponding slot in OTHER. This avoids a block copy. */ - - gimple_seq seq = NULL; - gimple stmt = gsi_stmt (*gsi); - - update_stmt_if_modified (stmt); - generate_copy_inout (elt, is_output, other, &seq); - gcc_assert (seq); - mark_all_v_defs_seq (seq); - - /* Preserve EH semantics. */ - if (stmt_ends_bb_p (stmt)) - { - gimple_stmt_iterator si; - gimple first; - gimple_seq blist = NULL; - bool thr = stmt_could_throw_p (stmt); - - /* If the last statement of this BB created an EH edge - before scalarization, we have to locate the first - statement that can throw in the new statement list and - use that as the last statement of this BB, such that EH - semantics is preserved. All statements up to this one - are added to the same BB. All other statements in the - list will be added to normal outgoing edges of the same - BB. If they access any memory, it's the same memory, so - we can assume they won't throw. */ - si = gsi_start (seq); - for (first = gsi_stmt (si); - thr && !gsi_end_p (si) && !stmt_could_throw_p (first); - first = gsi_stmt (si)) + if (access_has_children_p (lacc) && access_has_children_p (racc)) + { + gimple_stmt_iterator orig_gsi = *gsi; + bool refreshed; + + if (lacc->grp_read && !lacc->grp_covered) { - gsi_remove (&si, false); - gimple_seq_add_stmt (&blist, first); + handle_unscalarized_data_in_subtree (racc, lhs, gsi); + refreshed = true; } + else + refreshed = false; - /* Extract the first remaining statement from LIST, this is - the EH statement if there is one. */ - gsi_remove (&si, false); - - if (blist) - sra_insert_before (gsi, blist); - - /* Replace the old statement with this new representative. */ - gsi_replace (gsi, first, true); + load_assign_lhs_subreplacements (lacc->first_child, racc, + lacc->offset, racc->offset, + &orig_gsi, gsi, &refreshed, lhs); + if (!refreshed || !racc->grp_unscalarized_data) + { + if (*stmt == gsi_stmt (*gsi)) + gsi_next (gsi); - if (!gsi_end_p (si)) + unlink_stmt_vdef (*stmt); + gsi_remove (&orig_gsi, true); + return SRA_SA_REMOVED; + } + } + else + { + if (access_has_children_p (racc)) { - /* If any reference would trap, then they all would. And more - to the point, the first would. Therefore none of the rest - will trap since the first didn't. Indicate this by - iterating over the remaining statements and set - TREE_THIS_NOTRAP in all INDIRECT_REFs. */ - do + if (!racc->grp_unscalarized_data) { - walk_gimple_stmt (&si, NULL, mark_notrap, NULL); - gsi_next (&si); + generate_subtree_copies (racc->first_child, lhs, + racc->offset, 0, 0, gsi, + false, false); + gcc_assert (*stmt == gsi_stmt (*gsi)); + unlink_stmt_vdef (*stmt); + gsi_remove (gsi, true); + return SRA_SA_REMOVED; } - while (!gsi_end_p (si)); - - insert_edge_copies_seq (seq, gsi_bb (*gsi)); + else + generate_subtree_copies (racc->first_child, lhs, + racc->offset, 0, 0, gsi, false, true); } + else if (access_has_children_p (lacc)) + generate_subtree_copies (lacc->first_child, rhs, lacc->offset, + 0, 0, gsi, true, true); } - else - sra_replace (gsi, seq); } + + return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } -/* Generate initializations for all scalarizable parameters. */ +/* Generate statements initializing scalar replacements of parts of function + parameters. */ static void -scalarize_parms (void) +initialize_parameter_reductions (void) { + gimple_stmt_iterator gsi; gimple_seq seq = NULL; - unsigned i; - bitmap_iterator bi; + tree parm; - EXECUTE_IF_SET_IN_BITMAP (needs_copy_in, 0, i, bi) + for (parm = DECL_ARGUMENTS (current_function_decl); + parm; + parm = TREE_CHAIN (parm)) { - tree var = referenced_var (i); - struct sra_elt *elt = lookup_element (NULL, var, NULL, NO_INSERT); - generate_copy_inout (elt, true, var, &seq); - } + VEC (access_p, heap) *access_vec; + struct access *access; - if (seq) - { - insert_edge_copies_seq (seq, ENTRY_BLOCK_PTR); - mark_all_v_defs_seq (seq); - } -} + if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm))) + continue; + access_vec = get_base_access_vector (parm); + if (!access_vec) + continue; -/* Entry point to phase 4. Update the function to match replacements. */ + if (!seq) + { + seq = gimple_seq_alloc (); + gsi = gsi_start (seq); + } -static void -scalarize_function (void) -{ - static const struct sra_walk_fns fns = { - scalarize_use, scalarize_copy, scalarize_init, scalarize_ldst, false - }; + for (access = VEC_index (access_p, access_vec, 0); + access; + access = access->next_grp) + generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true); + } - sra_walk_function (&fns); - scalarize_parms (); - gsi_commit_edge_inserts (); + if (seq) + gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq); } -\f -/* Debug helper function. Print ELT in a nice human-readable format. */ - -static void -dump_sra_elt_name (FILE *f, struct sra_elt *elt) +/* The "main" function of intraprocedural SRA passes. Runs the analysis and if + it reveals there are components of some aggregates to be scalarized, it runs + the required transformations. */ +static unsigned int +perform_intra_sra (void) { - if (elt->parent && TREE_CODE (elt->parent->type) == COMPLEX_TYPE) - { - fputs (elt->element == integer_zero_node ? "__real__ " : "__imag__ ", f); - dump_sra_elt_name (f, elt->parent); - } - else - { - if (elt->parent) - dump_sra_elt_name (f, elt->parent); - if (DECL_P (elt->element)) - { - if (TREE_CODE (elt->element) == FIELD_DECL) - fputc ('.', f); - print_generic_expr (f, elt->element, dump_flags); - } - else if (TREE_CODE (elt->element) == BIT_FIELD_REF) - fprintf (f, "$B" HOST_WIDE_INT_PRINT_DEC "F" HOST_WIDE_INT_PRINT_DEC, - tree_low_cst (TREE_OPERAND (elt->element, 2), 1), - tree_low_cst (TREE_OPERAND (elt->element, 1), 1)); - else if (TREE_CODE (elt->element) == RANGE_EXPR) - fprintf (f, "["HOST_WIDE_INT_PRINT_DEC".."HOST_WIDE_INT_PRINT_DEC"]", - TREE_INT_CST_LOW (TREE_OPERAND (elt->element, 0)), - TREE_INT_CST_LOW (TREE_OPERAND (elt->element, 1))); - else - fprintf (f, "[" HOST_WIDE_INT_PRINT_DEC "]", - TREE_INT_CST_LOW (elt->element)); - } -} + int ret = 0; + sra_initialize (); -/* Likewise, but callable from the debugger. */ + if (!find_var_candidates ()) + goto out; -void -debug_sra_elt_name (struct sra_elt *elt) -{ - dump_sra_elt_name (stderr, elt); - fputc ('\n', stderr); -} + if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL, + true, NULL)) + goto out; -static void -sra_init_cache (void) -{ - if (sra_type_decomp_cache) - return; + if (!analyze_all_variable_accesses ()) + goto out; - sra_type_decomp_cache = BITMAP_ALLOC (NULL); - sra_type_inst_cache = BITMAP_ALLOC (NULL); -} + scan_function (sra_modify_expr, sra_modify_assign, NULL, + false, NULL); + initialize_parameter_reductions (); + ret = TODO_update_ssa; -/* Main entry point. */ + if (sra_mode == SRA_MODE_EARLY_INTRA) + ret = TODO_update_ssa; + else + ret = TODO_update_ssa | TODO_rebuild_alias; + out: + sra_deinitialize (); + return ret; +} +/* Perform early intraprocedural SRA. */ static unsigned int -tree_sra (void) +early_intra_sra (void) { - /* Initialize local variables. */ - gcc_obstack_init (&sra_obstack); - sra_candidates = BITMAP_ALLOC (NULL); - needs_copy_in = BITMAP_ALLOC (NULL); - sra_init_cache (); - sra_map = htab_create (101, sra_elt_hash, sra_elt_eq, NULL); - - /* Scan. If we find anything, instantiate and scalarize. */ - if (find_candidates_for_sra ()) - { - scan_function (); - decide_instantiations (); - scalarize_function (); - } - - /* Free allocated memory. */ - htab_delete (sra_map); - sra_map = NULL; - BITMAP_FREE (sra_candidates); - BITMAP_FREE (needs_copy_in); - BITMAP_FREE (sra_type_decomp_cache); - BITMAP_FREE (sra_type_inst_cache); - obstack_free (&sra_obstack, NULL); - return 0; + sra_mode = SRA_MODE_EARLY_INTRA; + return perform_intra_sra (); } +/* Perform "late" intraprocedural SRA. */ static unsigned int -tree_sra_early (void) +late_intra_sra (void) { - unsigned int ret; - - early_sra = true; - ret = tree_sra (); - early_sra = false; - - return ret; + sra_mode = SRA_MODE_INTRA; + return perform_intra_sra (); } + static bool -gate_sra (void) +gate_intra_sra (void) { return flag_tree_sra != 0; } + struct gimple_opt_pass pass_sra_early = { { GIMPLE_PASS, - "esra", /* name */ - gate_sra, /* gate */ - tree_sra_early, /* execute */ + "esra", /* name */ + gate_intra_sra, /* gate */ + early_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ - PROP_cfg | PROP_ssa, /* properties_required */ + PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ - 0, /* properties_destroyed */ + 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa @@ -3679,20 +2458,21 @@ struct gimple_opt_pass pass_sra_early = } }; + struct gimple_opt_pass pass_sra = { { GIMPLE_PASS, - "sra", /* name */ - gate_sra, /* gate */ - tree_sra, /* execute */ + "sra", /* name */ + gate_intra_sra, /* gate */ + late_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ - PROP_cfg | PROP_ssa, /* properties_required */ + PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ - 0, /* properties_destroyed */ + 0, /* properties_destroyed */ TODO_update_address_taken, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa Index: mine/gcc/Makefile.in =================================================================== --- mine.orig/gcc/Makefile.in +++ mine/gcc/Makefile.in @@ -2732,11 +2732,9 @@ tree-ssa-ccp.o : tree-ssa-ccp.c $(TREE_F $(DIAGNOSTIC_H) $(FUNCTION_H) $(TIMEVAR_H) $(TM_H) coretypes.h \ $(TREE_DUMP_H) $(BASIC_BLOCK_H) $(TREE_PASS_H) langhooks.h \ tree-ssa-propagate.h value-prof.h $(FLAGS_H) $(TARGET_H) $(TOPLEV_H) -tree-sra.o : tree-sra.c $(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(RTL_H) \ - $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_H) $(TREE_INLINE_H) \ - $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) $(GIMPLE_H) \ - langhooks.h $(TREE_PASS_H) $(FLAGS_H) $(EXPR_H) $(BASIC_BLOCK_H) \ - $(BITMAP_H) $(GGC_H) hard-reg-set.h $(OBSTACK_H) $(PARAMS_H) $(TARGET_H) +tree-sra.o : tree-sra.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ + $(GIMPLE_H) $(TREE_FLOW_H) $(DIAGNOSTIC_H) $(TREE_DUMP_H) \ + $(TIMEVAR_H) $(PARAMS_H) $(TARGET_H) $(FLAGS_H) tree-switch-conversion.o : tree-switch-conversion.c $(CONFIG_H) $(SYSTEM_H) \ $(TREE_H) $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_H) $(TREE_INLINE_H) \ $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) $(GIMPLE_H) \ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-04-28 10:14 ` [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates Martin Jambor @ 2009-04-28 10:27 ` Martin Jambor 2009-04-29 12:56 ` Richard Guenther 2009-04-29 10:59 ` Richard Guenther 1 sibling, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-04-28 10:27 UTC (permalink / raw) To: GCC Patches; +Cc: Richard Guenther, Jan Hubicka On Tue, Apr 28, 2009 at 12:04:32PM +0200, Martin Jambor wrote: > This is the new intraprocedural SRA. I have stripped off the > interprocedural part and will propose to commit it separately later. > I have tried to remove almost every trace of IPA-SRA, however, two > provisions for it have remained in the patch. First, an enumeration > (rather than a boolean) is used to distuinguish between "early" and > "late" SRA so that other SRA modes can be added later on. Second, > scan_function() has a hook parameter and a void pointer parameter > which are not used in this patch but will be by IPA-SRA. > > Otherwise, the patch is hopefully self-contained and the bases of its > operation is described by the initial comment. > > The patch bootstraps (on x86_64-linux-gnu but I am about to try it on > hppa-linux-gnu too) but produces a small number of testsuite failures > which are handled by the two following patches. > > Thanks, > > Martin > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > * tree-sra.c (enum sra_mode): The whole contents of the file was > replaced. Hm, the patch is quite unreadable, below is the new tree-sra.c file which entirely replaces the old one (note that the patch also modifies the Makefile though): /* Scalar Replacement of Aggregates (SRA) converts some structure references into scalar references, exposing them to the scalar optimizers. Copyright (C) 2008, 2009 Free Software Foundation, Inc. Contributed by Martin Jambor <mjambor@suse.cz> This file is part of GCC. GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ /* This file implements Scalar Reduction of Aggregates (SRA). SRA is run twice, once in the early stages of compilation (early SRA) and once in the late stages (late SRA). The aim of both is to turn references to scalar parts of aggregates into uses of independent scalar variables. The two passes are nearly identical, the only difference is that early SRA does not scalarize unions which are used as the result in a GIMPLE_RETURN statement because together with inlining this can lead to weird type conversions. Both passes operate in four stages: 1. The declarations that have properties which make them candidates for scalarization are identified in function find_var_candidates(). The candidates are stored in candidate_bitmap. 2. The function body is scanned. In the process, declarations which are used in a manner that prevent their scalarization are removed from the candidate bitmap. More importantly, for every access into an aggregate, an access structure (struct access) is created by create_access() and stored in a vector associated with the aggregate. Among other information, the aggregate declaration, the offset and size of the access and its type are stored in the structure. On a related note, assign_link structures are created for every assign statement between candidate aggregates and attached to the related accesses. 3. The vectors of accesses are analyzed. They are first sorted according to their offset and size and then scanned for partially overlapping accesses (i.e. those which overlap but one is not entirely within another). Such an access disqualifies the whole aggregate from being scalarized. If there is no such inhibiting overlap, a representative access structure is chosen for every unique combination of offset and size. Afterwards, the pass builds a set of trees from these structures, in which children of an access are within their parent (in terms of offset and size). Then accesses are propagated whenever possible (i.e. in cases when it does not create a partially overlapping access) across assign_links from the right hand side to the left hand side. Then the set of trees for each declaration is traversed again and those accesses which should be replaced by a scalar are identified. 4. The function is traversed again, and for every reference into an aggregate that has some component which is about to be scalarized, statements are amended and new statements are created as necessary. Finally, if a parameter got scalarized, the scalar replacements are initialized with values from respective parameter aggregates. */ #include "config.h" #include "system.h" #include "coretypes.h" #include "alloc-pool.h" #include "tm.h" #include "tree.h" #include "gimple.h" #include "tree-flow.h" #include "diagnostic.h" #include "tree-dump.h" #include "timevar.h" #include "params.h" #include "target.h" #include "flags.h" /* Enumeration of all aggregate reductions we can do. */ enum sra_mode {SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ SRA_MODE_INTRA}; /* late intraprocedural SRA */ /* Global variable describing which aggregate reduction we are performing at the moment. */ static enum sra_mode sra_mode; struct assign_link; /* ACCESS represents each access to an aggregate variable (as a whole or a part). It can also represent a group of accesses that refer to exactly the same fragment of an aggregate (i.e. those that have exactly the same offset and size). Such representatives for a single aggregate, once determined, are linked in a linked list and have the group fields set. Moreover, when doing intraprocedural SRA, a tree is built from those representatives (by the means of first_child and next_sibling pointers), in which all items in a subtree are "within" the root, i.e. their offset is greater or equal to offset of the root and offset+size is smaller or equal to offset+size of the root. Children of an access are sorted by offset. */ struct access { /* Values returned by `get_ref_base_and_extent' for each COMPONENT_REF If EXPR isn't a COMPONENT_REF just set `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ HOST_WIDE_INT offset; HOST_WIDE_INT size; tree base; /* Expression. */ tree expr; /* Type. */ tree type; /* Next group representative for this aggregate. */ struct access *next_grp; /* Pointer to the group representative. Pointer to itself if the struct is the representative. */ struct access *group_representative; /* If this access has any children (in terms of the definition above), this points to the first one. */ struct access *first_child; /* Pointer to the next sibling in the access tree as described above. */ struct access *next_sibling; /* Pointers to the first and last element in the linked list of assign links. */ struct assign_link *first_link, *last_link; /* Pointer to the next access in the work queue. */ struct access *next_queued; /* Replacement variable for this access "region." Never to be accessed directly, always only by the means of get_access_replacement() and only when grp_to_be_replaced flag is set. */ tree replacement_decl; /* Is this particular access write access? */ unsigned write : 1; /* Is this access currently in the work queue? */ unsigned grp_queued : 1; /* Does this group contain a write access? This flag is propagated down the access tree. */ unsigned grp_write : 1; /* Does this group contain a read access? This flag is propagated down the access tree. */ unsigned grp_read : 1; /* Is the subtree rooted in this access fully covered by scalar replacements? */ unsigned grp_covered : 1; /* If set to true, this access and all below it in an access tree must not be scalarized. */ unsigned grp_unscalarizable_region : 1; /* Whether data have been written to parts of the aggregate covered by this access which is not to be scalarized. This flag is propagated up in the access tree. */ unsigned grp_unscalarized_data : 1; /* Does this access and/or group contain a write access through a BIT_FIELD_REF? */ unsigned grp_bfr_lhs : 1; /* Set when a scalar replacement should be created for this variable. We do the decision and creation at different places because create_tmp_var cannot be called from within FOR_EACH_REFERENCED_VAR. */ unsigned grp_to_be_replaced : 1; }; typedef struct access *access_p; DEF_VEC_P (access_p); DEF_VEC_ALLOC_P (access_p, heap); /* Alloc pool for allocating access structures. */ static alloc_pool access_pool; /* A structure linking lhs and rhs accesses from an aggregate assignment. They are used to propagate subaccesses from rhs to lhs as long as they don't conflict with what is already there. */ struct assign_link { struct access *lacc, *racc; struct assign_link *next; }; /* Alloc pool for allocating assign link structures. */ static alloc_pool link_pool; /* Base (tree) -> Vector (VEC(access_p,heap) *) map. */ static struct pointer_map_t *base_access_vec; /* Bitmap of bases (candidates). */ static bitmap candidate_bitmap; /* Bitmap of declarations used in a return statement. */ static bitmap retvals_bitmap; /* Obstack for creation of fancy names. */ static struct obstack name_obstack; /* Head of a linked list of accesses that need to have its subaccesses propagated to their assignment counterparts. */ static struct access *work_queue_head; /* Dump contents of ACCESS to file F in a human friendly way. If GRP is true, representative fields are dumped, otherwise those which only describe the individual access are. */ static void dump_access (FILE *f, struct access *access, bool grp) { fprintf (f, "access { "); fprintf (f, "base = (%d)'", DECL_UID (access->base)); print_generic_expr (f, access->base, 0); fprintf (f, "', offset = %d", (int) access->offset); fprintf (f, ", size = %d", (int) access->size); fprintf (f, ", expr = "); print_generic_expr (f, access->expr, 0); fprintf (f, ", type = "); print_generic_expr (f, access->type, 0); if (grp) fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, " "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " "grp_to_be_replaced = %d\n", access->grp_write, access->grp_read, access->grp_covered, access->grp_unscalarizable_region, access->grp_unscalarized_data, access->grp_to_be_replaced); else fprintf (f, ", write = %d'\n", access->write); } /* Dump a subtree rooted in ACCESS to file F, indent by LEVEL. */ static void dump_access_tree_1 (FILE *f, struct access *access, int level) { do { int i; for (i = 0; i < level; i++) fputs ("* ", dump_file); dump_access (f, access, true); if (access->first_child) dump_access_tree_1 (f, access->first_child, level + 1); access = access->next_sibling; } while (access); } /* Dump all access trees for a variable, given the pointer to the first root in ACCESS. */ static void dump_access_tree (FILE *f, struct access *access) { for (; access; access = access->next_grp) dump_access_tree_1 (f, access, 0); } /* Return a vector of pointers to accesses for the variable given in BASE or NULL if there is none. */ static VEC (access_p, heap) * get_base_access_vector (tree base) { void **slot; slot = pointer_map_contains (base_access_vec, base); if (!slot) return NULL; else return *(VEC (access_p, heap) **) slot; } /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted in ACCESS. Return NULL if it cannot be found. */ static struct access * find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, HOST_WIDE_INT size) { while (access && (access->offset != offset || access->size != size)) { struct access *child = access->first_child; while (child && (child->offset + child->size <= offset)) child = child->next_sibling; access = child; } return access; } /* Return the first group representative for DECL or NULL if none exists. */ static struct access * get_first_repr_for_decl (tree base) { VEC (access_p, heap) *access_vec; access_vec = get_base_access_vector (base); if (!access_vec) return NULL; return VEC_index (access_p, access_vec, 0); } /* Find an access representative for the variable BASE and given OFFSET and SIZE. Requires that access trees have already been built. Return NULL if it cannot be found. */ static struct access * get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size) { struct access *access; access = get_first_repr_for_decl (base); while (access && (access->offset + access->size <= offset)) access = access->next_grp; if (!access) return NULL; return find_access_in_subtree (access, offset, size); } /* Add LINK to the linked list of assign links of RACC. */ static void add_link_to_rhs (struct access *racc, struct assign_link *link) { gcc_assert (link->racc == racc); if (!racc->first_link) { gcc_assert (!racc->last_link); racc->first_link = link; } else racc->last_link->next = link; racc->last_link = link; link->next = NULL; } /* Move all link structures in their linked list in OLD_RACC to the linked list in NEW_RACC. */ static void relink_to_new_repr (struct access *new_racc, struct access *old_racc) { if (!old_racc->first_link) { gcc_assert (!old_racc->last_link); return; } if (new_racc->first_link) { gcc_assert (!new_racc->last_link->next); gcc_assert (!old_racc->last_link || !old_racc->last_link->next); new_racc->last_link->next = old_racc->first_link; new_racc->last_link = old_racc->last_link; } else { gcc_assert (!new_racc->last_link); new_racc->first_link = old_racc->first_link; new_racc->last_link = old_racc->last_link; } old_racc->first_link = old_racc->last_link = NULL; } /* Add ACCESS to the work queue (which is actually a stack). */ static void add_access_to_work_queue (struct access *access) { if (!access->grp_queued) { gcc_assert (!access->next_queued); access->next_queued = work_queue_head; access->grp_queued = 1; work_queue_head = access; } } /* Pop an access from the work queue, and return it, assuming there is one. */ static struct access * pop_access_from_work_queue (void) { struct access *access = work_queue_head; work_queue_head = access->next_queued; access->next_queued = NULL; access->grp_queued = 0; return access; } /* Allocate necessary structures. */ static void sra_initialize (void) { candidate_bitmap = BITMAP_ALLOC (NULL); retvals_bitmap = BITMAP_ALLOC (NULL); gcc_obstack_init (&name_obstack); access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16); link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16); base_access_vec = pointer_map_create (); } /* Hook fed to pointer_map_traverse, deallocate stored vectors. */ static bool delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value, void *data ATTRIBUTE_UNUSED) { VEC (access_p, heap) *access_vec; access_vec = (VEC (access_p, heap) *) *value; VEC_free (access_p, heap, access_vec); return true; } /* Deallocate all general structures. */ static void sra_deinitialize (void) { BITMAP_FREE (candidate_bitmap); BITMAP_FREE (retvals_bitmap); free_alloc_pool (access_pool); free_alloc_pool (link_pool); obstack_free (&name_obstack, NULL); pointer_map_traverse (base_access_vec, delete_base_accesses, NULL); pointer_map_destroy (base_access_vec); } /* Remove DECL from candidates for SRA and write REASON to the dump file if there is one. */ static void disqualify_candidate (tree decl, const char *reason) { bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)); if (dump_file) { fprintf (dump_file, "! Disqualifying "); print_generic_expr (dump_file, decl, 0); fprintf (dump_file, " - %s\n", reason); } } /* Return true iff the type contains a field or an element which does not allow scalarization. */ static bool type_internals_preclude_sra_p (tree type) { tree fld; tree et; switch (TREE_CODE (type)) { case RECORD_TYPE: case UNION_TYPE: case QUAL_UNION_TYPE: for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) if (TREE_CODE (fld) == FIELD_DECL) { tree ft = TREE_TYPE (fld); if (TREE_THIS_VOLATILE (fld) || !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld) || !host_integerp (DECL_FIELD_OFFSET (fld), 1) || !host_integerp (DECL_SIZE (fld), 1)) return true; if (AGGREGATE_TYPE_P (ft) && type_internals_preclude_sra_p (ft)) return true; } return false; case ARRAY_TYPE: et = TREE_TYPE (type); if (AGGREGATE_TYPE_P (et)) return type_internals_preclude_sra_p (et); else return false; default: return false; } } /* Create and insert access for EXPR. Return created access, or NULL if it is not possible. */ static struct access * create_access (tree expr, bool write) { struct access *access; void **slot; VEC (access_p,heap) *vec; HOST_WIDE_INT offset, size, max_size; tree base = expr; bool unscalarizable_region = false; if (handled_component_p (expr)) base = get_ref_base_and_extent (expr, &offset, &size, &max_size); else { tree tree_size; tree_size = TYPE_SIZE (TREE_TYPE (base)); if (tree_size && host_integerp (tree_size, 1)) size = max_size = tree_low_cst (tree_size, 1); else size = max_size = -1; offset = 0; } if (!base || !DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; if (size != max_size) { size = max_size; unscalarizable_region = true; } if (size < 0) { disqualify_candidate (base, "Encountered an ultra variable sized " "access."); return NULL; } access = (struct access *) pool_alloc (access_pool); memset (access, 0, sizeof (struct access)); access->base = base; access->offset = offset; access->size = size; access->expr = expr; access->type = TREE_TYPE (expr); access->write = write; access->grp_unscalarizable_region = unscalarizable_region; slot = pointer_map_contains (base_access_vec, base); if (slot) vec = (VEC (access_p, heap) *) *slot; else vec = VEC_alloc (access_p, heap, 32); VEC_safe_push (access_p, heap, vec, access); *((struct VEC (access_p,heap) **) pointer_map_insert (base_access_vec, base)) = vec; return access; } /* Callback of walk_tree. Search the given tree for a declaration and exclude it from the candidates. */ static tree disqualify_all (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) { tree base = *tp; if (TREE_CODE (base) == SSA_NAME) base = SSA_NAME_VAR (base); if (DECL_P (base)) { disqualify_candidate (base, "From within disqualify_all()."); *walk_subtrees = 0; } else *walk_subtrees = 1; return NULL_TREE; } /* Scan expression EXPR and create access structures for all accesses to candidates for scalarization. Return the created access or NULL if none is created. */ static struct access * build_access_from_expr_1 (tree *expr_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write) { struct access *ret = NULL; tree expr = *expr_ptr; tree safe_expr = expr; bool bit_ref; if (TREE_CODE (expr) == BIT_FIELD_REF) { expr = TREE_OPERAND (expr, 0); bit_ref = true; } else bit_ref = false; while (TREE_CODE (expr) == NOP_EXPR || TREE_CODE (expr) == VIEW_CONVERT_EXPR || TREE_CODE (expr) == REALPART_EXPR || TREE_CODE (expr) == IMAGPART_EXPR) expr = TREE_OPERAND (expr, 0); switch (TREE_CODE (expr)) { case ADDR_EXPR: case SSA_NAME: case INDIRECT_REF: break; case VAR_DECL: case PARM_DECL: case RESULT_DECL: case COMPONENT_REF: case ARRAY_REF: ret = create_access (expr, write); break; case REALPART_EXPR: case IMAGPART_EXPR: expr = TREE_OPERAND (expr, 0); ret = create_access (expr, write); break; case ARRAY_RANGE_REF: default: walk_tree (&safe_expr, disqualify_all, NULL, NULL); break; } if (write && bit_ref && ret) ret->grp_bfr_lhs = 1; return ret; } /* Scan expression EXPR and create access structures for all accesses to candidates for scalarization. Return true if any access has been inserted. */ static bool build_access_from_expr (tree *expr_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write, void *data ATTRIBUTE_UNUSED) { return build_access_from_expr_1 (expr_ptr, gsi, write) != NULL; } /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in modes in which it matters, return true iff they have been disqualified. RHS may be NULL, in that case ignore it. If we scalarize an aggregate in intra-SRA we may need to add statements after each statement. This is not possible if a statement unconditionally has to end the basic block. */ static bool disqualify_ops_if_throwing_stmt (gimple stmt, tree *lhs, tree *rhs) { if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)) { walk_tree (lhs, disqualify_all, NULL, NULL); if (rhs) walk_tree (rhs, disqualify_all, NULL, NULL); return true; } return false; } /* Result code for scan_assign callback for scan_function. */ enum scan_assign_result {SRA_SA_NONE, /* nothing done for the stmt */ SRA_SA_PROCESSED, /* stmt analyzed/changed */ SRA_SA_REMOVED}; /* stmt redundant and eliminated */ /* Scan expressions occuring in the statement pointed to by STMT_EXPR, create access structures for all accesses to candidates for scalarization and remove those candidates which occur in statements or expressions that prevent them from being split apart. Return true if any access has been inserted. */ static enum scan_assign_result build_accesses_from_assign (gimple *stmt_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { gimple stmt = *stmt_ptr; tree *lhs_ptr, *rhs_ptr; struct access *lacc, *racc; if (gimple_assign_rhs2 (stmt)) return SRA_SA_NONE; lhs_ptr = gimple_assign_lhs_ptr (stmt); rhs_ptr = gimple_assign_rhs1_ptr (stmt); if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr)) return SRA_SA_NONE; racc = build_access_from_expr_1 (rhs_ptr, gsi, false); lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); if (lacc && racc && !lacc->grp_unscalarizable_region && !racc->grp_unscalarizable_region && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) && lacc->size <= racc->size && useless_type_conversion_p (lacc->type, racc->type)) { struct assign_link *link; link = (struct assign_link *) pool_alloc (link_pool); memset (link, 0, sizeof (struct assign_link)); link->lacc = lacc; link->racc = racc; add_link_to_rhs (racc, link); } return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE; } /* Scan function and look for interesting statements. Return true if any has been found or processed, as indicated by callbacks. SCAN_EXPR is a callback called on all expressions within statements except assign statements and those deemed entirely unsuitable for some reason (all operands in such statements and expression are removed from candidate_bitmap). SCAN_ASSIGN is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback called on assign statements and those call statements which have a lhs and it is the only callback which can be NULL. ANALYSIS_STAGE is true when running in the analysis stage of a pass and thus no statement is being modified. DATA is a pointer passed to all callbacks. If any single callback returns true, this function also returns true, otherwise it returns false. */ static bool scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *), enum scan_assign_result (*scan_assign) (gimple *, gimple_stmt_iterator *, void *), bool (*handle_ssa_defs)(gimple, void *), bool analysis_stage, void *data) { gimple_stmt_iterator gsi; basic_block bb; unsigned i; tree *t; bool ret = false; FOR_EACH_BB (bb) { bool bb_changed = false; gsi = gsi_start_bb (bb); while (!gsi_end_p (gsi)) { gimple stmt = gsi_stmt (gsi); enum scan_assign_result assign_result; bool any = false, deleted = false; switch (gimple_code (stmt)) { case GIMPLE_RETURN: t = gimple_return_retval_ptr (stmt); if (*t != NULL_TREE) { if (DECL_P (*t)) { tree ret_type = TREE_TYPE (*t); if (sra_mode == SRA_MODE_EARLY_INTRA && (TREE_CODE (ret_type) == UNION_TYPE || TREE_CODE (ret_type) == QUAL_UNION_TYPE)) disqualify_candidate (*t, "Union in a return statement."); else bitmap_set_bit (retvals_bitmap, DECL_UID (*t)); } any |= scan_expr (t, &gsi, false, data); } break; case GIMPLE_ASSIGN: assign_result = scan_assign (&stmt, &gsi, data); any |= assign_result == SRA_SA_PROCESSED; deleted = assign_result == SRA_SA_REMOVED; if (handle_ssa_defs && assign_result != SRA_SA_REMOVED) any |= handle_ssa_defs (stmt, data); break; case GIMPLE_CALL: /* Operands must be processed before the lhs. */ for (i = 0; i < gimple_call_num_args (stmt); i++) { tree *argp = gimple_call_arg_ptr (stmt, i); any |= scan_expr (argp, &gsi, false, data); } if (gimple_call_lhs (stmt)) { tree *lhs_ptr = gimple_call_lhs_ptr (stmt); if (!analysis_stage || !disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, NULL)) { any |= scan_expr (lhs_ptr, &gsi, true, data); if (handle_ssa_defs) any |= handle_ssa_defs (stmt, data); } } break; case GIMPLE_ASM: for (i = 0; i < gimple_asm_ninputs (stmt); i++) { tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i)); any |= scan_expr (op, &gsi, false, data); } for (i = 0; i < gimple_asm_noutputs (stmt); i++) { tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i)); any |= scan_expr (op, &gsi, true, data); } default: if (analysis_stage) walk_gimple_op (stmt, disqualify_all, NULL); break; } if (any) { ret = true; bb_changed = true; if (!analysis_stage) { update_stmt (stmt); if (!stmt_could_throw_p (stmt)) remove_stmt_from_eh_region (stmt); } } if (deleted) bb_changed = true; else { gsi_next (&gsi); ret = true; } } if (!analysis_stage && bb_changed) gimple_purge_dead_eh_edges (bb); } return ret; } /* Helper of QSORT function. There are pointers to accesses in the array. An access is considered smaller than another if it has smaller offset or if the offsets are the same but is size is bigger. */ static int compare_access_positions (const void *a, const void *b) { const access_p *fp1 = (const access_p *) a; const access_p *fp2 = (const access_p *) b; const access_p f1 = *fp1; const access_p f2 = *fp2; if (f1->offset != f2->offset) return f1->offset < f2->offset ? -1 : 1; if (f1->size == f2->size) return 0; /* We want the bigger accesses first, thus the opposite operator in the next line: */ return f1->size > f2->size ? -1 : 1; } /* Append a name of the declaration to the name obstack. A helper function for make_fancy_name. */ static void make_fancy_decl_name (tree decl) { char buffer[32]; tree name = DECL_NAME (decl); if (name) obstack_grow (&name_obstack, IDENTIFIER_POINTER (name), IDENTIFIER_LENGTH (name)); else { sprintf (buffer, "D%u", DECL_UID (decl)); obstack_grow (&name_obstack, buffer, strlen (buffer)); } } /* Helper for make_fancy_name. */ static void make_fancy_name_1 (tree expr) { char buffer[32]; tree index; if (DECL_P (expr)) { make_fancy_decl_name (expr); return; } switch (TREE_CODE (expr)) { case COMPONENT_REF: make_fancy_name_1 (TREE_OPERAND (expr, 0)); obstack_1grow (&name_obstack, '$'); make_fancy_decl_name (TREE_OPERAND (expr, 1)); break; case ARRAY_REF: make_fancy_name_1 (TREE_OPERAND (expr, 0)); obstack_1grow (&name_obstack, '$'); /* Arrays with only one element may not have a constant as their index. */ index = TREE_OPERAND (expr, 1); if (TREE_CODE (index) != INTEGER_CST) break; sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index)); obstack_grow (&name_obstack, buffer, strlen (buffer)); break; case BIT_FIELD_REF: case REALPART_EXPR: case IMAGPART_EXPR: gcc_unreachable (); /* we treat these as scalars. */ break; default: break; } } /* Create a human readable name for replacement variable of ACCESS. */ static char * make_fancy_name (tree expr) { make_fancy_name_1 (expr); obstack_1grow (&name_obstack, '\0'); return XOBFINISH (&name_obstack, char *); } /* Helper function for build_ref_for_offset. */ static bool build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset, tree exp_type) { while (1) { tree fld; tree tr_size, index; HOST_WIDE_INT el_size; if (offset == 0 && exp_type && useless_type_conversion_p (exp_type, type)) return true; switch (TREE_CODE (type)) { case UNION_TYPE: case QUAL_UNION_TYPE: case RECORD_TYPE: /* Some ADA records are half-unions, treat all of them the same. */ for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) { HOST_WIDE_INT pos, size; tree expr, *expr_ptr; if (TREE_CODE (fld) != FIELD_DECL) continue; pos = int_bit_position (fld); gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0); size = tree_low_cst (DECL_SIZE (fld), 1); if (pos > offset || (pos + size) <= offset) continue; if (res) { expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld, NULL_TREE); expr_ptr = &expr; } else expr_ptr = NULL; if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld), offset - pos, exp_type)) { if (res) *res = expr; return true; } } return false; case ARRAY_TYPE: tr_size = TYPE_SIZE (TREE_TYPE (type)); if (!tr_size || !host_integerp (tr_size, 1)) return false; el_size = tree_low_cst (tr_size, 1); index = build_int_cst (TYPE_DOMAIN (type), offset / el_size); if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) index = int_const_binop (PLUS_EXPR, index, TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0); if (res) *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE, NULL_TREE); offset = offset % el_size; type = TREE_TYPE (type); break; default: if (offset != 0) return false; if (exp_type) return false; else return true; } } } /* Construct an expression that would reference a part of aggregate *EXPR of type TYPE at the given OFFSET of the type EXP_TYPE. If EXPR is NULL, the function only determines whether it can build such a reference without actually doing it. FIXME: Eventually this should be replaced with maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a minor rewrite of fold_stmt. */ static bool build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset, tree exp_type, bool allow_ptr) { if (allow_ptr && POINTER_TYPE_P (type)) { type = TREE_TYPE (type); if (expr) *expr = fold_build1 (INDIRECT_REF, type, *expr); } return build_ref_for_offset_1 (expr, type, offset, exp_type); } /* The very first phase of intraprocedural SRA. It marks in candidate_bitmap those with type which is suitable for scalarization. */ static bool find_var_candidates (void) { tree var, type; referenced_var_iterator rvi; bool ret = false; FOR_EACH_REFERENCED_VAR (var, rvi) { if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL) continue; type = TREE_TYPE (var); if (!AGGREGATE_TYPE_P (type) || needs_to_live_in_memory (var) || TREE_THIS_VOLATILE (var) || !COMPLETE_TYPE_P (type) || !host_integerp (TYPE_SIZE (type), 1) || tree_low_cst (TYPE_SIZE (type), 1) == 0 || type_internals_preclude_sra_p (type)) continue; bitmap_set_bit (candidate_bitmap, DECL_UID (var)); if (dump_file) { fprintf (dump_file, "Candidate (%d): ", DECL_UID (var)); print_generic_expr (dump_file, var, 0); fprintf (dump_file, "\n"); } ret = true; } return ret; } /* Return true if TYPE should be considered a scalar type by SRA. */ static bool is_sra_scalar_type (tree type) { enum tree_code code = TREE_CODE (type); return (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type) || FIXED_POINT_TYPE_P (type) || POINTER_TYPE_P (type) || code == VECTOR_TYPE || code == COMPLEX_TYPE || code == OFFSET_TYPE); } /* Sort all accesses for the given variable, check for partial overlaps and return NULL if there are any. If there are none, pick a representative for each combination of offset and size and create a linked list out of them. Return the pointer to the first representative and make sure it is the first one in the vector of accesses. */ static struct access * sort_and_splice_var_accesses (tree var) { int i, j, access_count; struct access *res, **prev_acc_ptr = &res; VEC (access_p, heap) *access_vec; bool first = true; HOST_WIDE_INT low = -1, high = 0; access_vec = get_base_access_vector (var); if (!access_vec) return NULL; access_count = VEC_length (access_p, access_vec); /* Sort by <OFFSET, SIZE>. */ qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p), compare_access_positions); i = 0; while (i < access_count) { struct access *access = VEC_index (access_p, access_vec, i); bool modification = access->write; bool grp_read = !access->write; bool grp_bfr_lhs = access->grp_bfr_lhs; bool first_scalar = is_sra_scalar_type (access->type); bool unscalarizable_region = access->grp_unscalarizable_region; if (first || access->offset >= high) { first = false; low = access->offset; high = access->offset + access->size; } else if (access->offset > low && access->offset + access->size > high) return NULL; else gcc_assert (access->offset >= low && access->offset + access->size <= high); j = i + 1; while (j < access_count) { struct access *ac2 = VEC_index (access_p, access_vec, j); if (ac2->offset != access->offset || ac2->size != access->size) break; modification |= ac2->write; grp_read |= !ac2->write; grp_bfr_lhs |= ac2->grp_bfr_lhs; unscalarizable_region |= ac2->grp_unscalarizable_region; relink_to_new_repr (access, ac2); /* If one of the equivalent accesses is scalar, use it as a representative (this happens when when there is for example on a single scalar field in a structure). */ if (!first_scalar && is_sra_scalar_type (ac2->type)) { struct access tmp_acc; first_scalar = true; memcpy (&tmp_acc, ac2, sizeof (struct access)); memcpy (ac2, access, sizeof (struct access)); memcpy (access, &tmp_acc, sizeof (struct access)); } ac2->group_representative = access; j++; } i = j; access->group_representative = access; access->grp_write = modification; access->grp_read = grp_read; access->grp_bfr_lhs = grp_bfr_lhs; access->grp_unscalarizable_region = unscalarizable_region; if (access->first_link) add_access_to_work_queue (access); *prev_acc_ptr = access; prev_acc_ptr = &access->next_grp; } gcc_assert (res == VEC_index (access_p, access_vec, 0)); return res; } /* Create a variable for the given ACCESS which determines the type, name and a few other properties. Return the variable declaration and store it also to ACCESS->replacement. */ static tree create_access_replacement (struct access *access) { tree repl; repl = make_rename_temp (access->type, "SR"); get_var_ann (repl); add_referenced_var (repl); DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); DECL_ARTIFICIAL (repl) = 1; if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) { char *pretty_name = make_fancy_name (access->expr); DECL_NAME (repl) = get_identifier (pretty_name); obstack_free (&name_obstack, pretty_name); SET_DECL_DEBUG_EXPR (repl, access->expr); DECL_DEBUG_EXPR_IS_FROM (repl) = 1; DECL_IGNORED_P (repl) = 0; TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base); } else { DECL_IGNORED_P (repl) = 1; TREE_NO_WARNING (repl) = 1; } if (access->grp_bfr_lhs) DECL_GIMPLE_REG_P (repl) = 0; if (dump_file) { fprintf (dump_file, "Created a replacement for "); print_generic_expr (dump_file, access->base, 0); fprintf (dump_file, " offset: %u, size: %u: ", (unsigned) access->offset, (unsigned) access->size); print_generic_expr (dump_file, repl, 0); fprintf (dump_file, "\n"); } return repl; } /* Return ACCESS scalar replacement, create it if it does not exist yet. */ static inline tree get_access_replacement (struct access *access) { gcc_assert (access->grp_to_be_replaced); if (access->replacement_decl) return access->replacement_decl; access->replacement_decl = create_access_replacement (access); return access->replacement_decl; } /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the linked list along the way. Stop when *ACCESS is NULL or the access pointed to it is not "within" the root. */ static void build_access_subtree (struct access **access) { struct access *root = *access, *last_child = NULL; HOST_WIDE_INT limit = root->offset + root->size; *access = (*access)->next_grp; while (*access && (*access)->offset + (*access)->size <= limit) { if (!last_child) root->first_child = *access; else last_child->next_sibling = *access; last_child = *access; build_access_subtree (access); } } /* Build a tree of access representatives, ACCESS is the pointer to the first one, others are linked in a list by the next_grp field. Decide about scalar replacements on the way, return true iff any are to be created. */ static void build_access_trees (struct access *access) { while (access) { struct access *root = access; build_access_subtree (&access); root->next_grp = access; } } /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when both seeming beneficial and when ALLOW_REPLACEMENTS allows it. Also set all sorts of access flags appropriately along the way, notably always ser grp_read when MARK_READ is true and grp_write when MARK_WRITE is true. */ static bool analyze_access_subtree (struct access *root, bool allow_replacements, bool mark_read, bool mark_write) { struct access *child; HOST_WIDE_INT limit = root->offset + root->size; HOST_WIDE_INT covered_to = root->offset; bool scalar = is_sra_scalar_type (root->type); bool hole = false, sth_created = false; if (mark_read) root->grp_read = true; else if (root->grp_read) mark_read = true; if (mark_write) root->grp_write = true; else if (root->grp_write) mark_write = true; if (root->grp_unscalarizable_region) allow_replacements = false; for (child = root->first_child; child; child = child->next_sibling) { if (!hole && child->offset < covered_to) hole = true; else covered_to += child->size; sth_created |= analyze_access_subtree (child, allow_replacements && !scalar, mark_read, mark_write); root->grp_unscalarized_data |= child->grp_unscalarized_data; hole |= !child->grp_covered; } if (allow_replacements && scalar && !root->first_child) { if (dump_file) { fprintf (dump_file, "Marking "); print_generic_expr (dump_file, root->base, 0); fprintf (dump_file, " offset: %u, size: %u: ", (unsigned) root->offset, (unsigned) root->size); fprintf (dump_file, " to be replaced.\n"); } root->grp_to_be_replaced = 1; sth_created = true; hole = false; } else if (covered_to < limit) hole = true; if (sth_created && !hole) { root->grp_covered = 1; return true; } if (root->grp_write || TREE_CODE (root->base) == PARM_DECL) root->grp_unscalarized_data = 1; /* not covered and written to */ if (sth_created) return true; return false; } /* Analyze all access trees linked by next_grp by the means of analyze_access_subtree. */ static bool analyze_access_trees (struct access *access) { bool ret = false; while (access) { if (analyze_access_subtree (access, true, false, false)) ret = true; access = access->next_grp; } return ret; } /* Return true iff a potential new child of LACC at offset OFFSET and with size SIZE would conflict with an already existing one. If exactly such a child already exists in LACC, store a pointer to it in EXACT_MATCH. */ static bool child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, HOST_WIDE_INT size, struct access **exact_match) { struct access *child; for (child = lacc->first_child; child; child = child->next_sibling) { if (child->offset == norm_offset && child->size == size) { *exact_match = child; return true; } if (child->offset < norm_offset + size && child->offset + child->size > norm_offset) return true; } return false; } /* Set the expr of TARGET to one just like MODEL but with is own base at the bottom of the handled components. */ static void duplicate_expr_for_different_base (struct access *target, struct access *model) { tree t, expr = unshare_expr (model->expr); gcc_assert (handled_component_p (expr)); t = expr; while (handled_component_p (TREE_OPERAND (t, 0))) t = TREE_OPERAND (t, 0); gcc_assert (TREE_OPERAND (t, 0) == model->base); TREE_OPERAND (t, 0) = target->base; target->expr = expr; } /* Create a new child access of PARENT, with all properties just like MODEL except for its offset and with its grp_write false and grp_read true. Return the new access. Note that this access is created long after all splicing and sorting, it's not located in any access vector and is automatically a representative of its group. */ static struct access * create_artificial_child_access (struct access *parent, struct access *model, HOST_WIDE_INT new_offset) { struct access *access; struct access **child; gcc_assert (!model->grp_unscalarizable_region); access = (struct access *) pool_alloc (access_pool); memset (access, 0, sizeof (struct access)); access->base = parent->base; access->offset = new_offset; access->size = model->size; duplicate_expr_for_different_base (access, model); access->type = model->type; access->grp_write = true; access->grp_read = false; child = &parent->first_child; while (*child && (*child)->offset < new_offset) child = &(*child)->next_sibling; access->next_sibling = *child; *child = access; return access; } /* Propagate all subaccesses of RACC across an assignment link to LACC. Return true if any new subaccess was created. Additionally, if RACC is a scalar access but LACC is not, change the type of the latter. */ static bool propagate_subacesses_accross_link (struct access *lacc, struct access *racc) { struct access *rchild; HOST_WIDE_INT norm_delta = lacc->offset - racc->offset; bool ret = false; if (is_sra_scalar_type (lacc->type) || lacc->grp_unscalarizable_region || racc->grp_unscalarizable_region) return false; if (!lacc->first_child && !racc->first_child && is_sra_scalar_type (racc->type) && (sra_mode == SRA_MODE_INTRA || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base)))) { duplicate_expr_for_different_base (lacc, racc); lacc->type = racc->type; return false; } gcc_assert (lacc->size <= racc->size); for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling) { struct access *new_acc = NULL; HOST_WIDE_INT norm_offset = rchild->offset + norm_delta; if (rchild->grp_unscalarizable_region) continue; if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size, &new_acc)) { if (new_acc && rchild->first_child) ret |= propagate_subacesses_accross_link (new_acc, rchild); continue; } new_acc = create_artificial_child_access (lacc, rchild, norm_offset); if (racc->first_child) propagate_subacesses_accross_link (new_acc, rchild); ret = true; } return ret; } /* Propagate all subaccesses across assignment links. */ static void propagate_all_subaccesses (void) { while (work_queue_head) { struct access *racc = pop_access_from_work_queue (); struct assign_link *link; gcc_assert (racc->first_link); for (link = racc->first_link; link; link = link->next) { struct access *lacc = link->lacc; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base))) continue; lacc = lacc->group_representative; if (propagate_subacesses_accross_link (lacc, racc) && lacc->first_link) add_access_to_work_queue (lacc); } } } /* Go through all accesses collected throughout the (intraprocedural) analysis stage, exclude overlapping ones, identify representatives and build trees out of them, making decisions about scalarization on the way. Return true iff there are any to-be-scalarized variables after this stage. */ static bool analyze_all_variable_accesses (void) { tree var; referenced_var_iterator rvi; bool res = false; FOR_EACH_REFERENCED_VAR (var, rvi) if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) { struct access *access; access = sort_and_splice_var_accesses (var); if (access) build_access_trees (access); else disqualify_candidate (var, "No or inhibitingly overlapping accesses."); } propagate_all_subaccesses (); FOR_EACH_REFERENCED_VAR (var, rvi) if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) { struct access *access = get_first_repr_for_decl (var); if (analyze_access_trees (access)) { res = true; if (dump_file) { fprintf (dump_file, "\nAccess trees for "); print_generic_expr (dump_file, var, 0); fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); dump_access_tree (dump_file, access); fprintf (dump_file, "\n"); } } else disqualify_candidate (var, "No scalar replacements to be created."); } return res; } /* Return true iff a reference statement into aggregate AGG can be built for every single to-be-replaced accesses that is a child of ACCESS, its sibling or a child of its sibling. TOP_OFFSET is the offset from the processed access subtree that has to be subtracted from offset of each access. */ static bool ref_expr_for_all_replacements_p (struct access *access, tree agg, HOST_WIDE_INT top_offset) { do { if (access->grp_to_be_replaced && !build_ref_for_offset (NULL, TREE_TYPE (agg), access->offset - top_offset, access->type, false)) return false; if (access->first_child && !ref_expr_for_all_replacements_p (access->first_child, agg, top_offset)) return false; access = access->next_sibling; } while (access); return true; } /* Generate statements copying scalar replacements of accesses within a subtree into or out of AGG. ACCESS is the first child of the root of the subtree to be processed. AGG is an aggregate type expression (can be a declaration but does not have to be, it can for example also be an indirect_ref). TOP_OFFSET is the offset of the processed subtree which has to be subtracted from offsets of individual accesses to get corresponding offsets for AGG. If CHUNK_SIZE is non-null, copy only replacements in the interval <start_offset, start_offset + chunk_size>, otherwise copy all. GSI is a statement iterator used to place the new statements. WRITE should be true when the statements should write from AGG to the replacement and false if vice versa. if INSERT_AFTER is true, new statements will be added after the current statement in GSI, they will be added before the statement otherwise. */ static void generate_subtree_copies (struct access *access, tree agg, HOST_WIDE_INT top_offset, HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size, gimple_stmt_iterator *gsi, bool write, bool insert_after) { do { tree expr = unshare_expr (agg); if (chunk_size && access->offset >= start_offset + chunk_size) return; if (access->grp_to_be_replaced && (chunk_size == 0 || access->offset + access->size > start_offset)) { bool repl_found; gimple stmt; repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg), access->offset - top_offset, access->type, false); gcc_assert (repl_found); if (write) stmt = gimple_build_assign (get_access_replacement (access), expr); else { tree repl = get_access_replacement (access); TREE_NO_WARNING (repl) = 1; stmt = gimple_build_assign (expr, repl); } if (insert_after) gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } if (access->first_child) generate_subtree_copies (access->first_child, agg, top_offset, start_offset, chunk_size, gsi, write, insert_after); access = access->next_sibling; } while (access); } /* Assign zero to all scalar replacements in an access subtree. ACCESS is the the root of the subtree to be processed. GSI is the statement iterator used for inserting statements which are added after the current statement if INSERT_AFTER is true or before it otherwise. */ static void init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi, bool insert_after) { struct access *child; if (access->grp_to_be_replaced) { gimple stmt; stmt = gimple_build_assign (get_access_replacement (access), fold_convert (access->type, integer_zero_node)); if (insert_after) gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } for (child = access->first_child; child; child = child->next_sibling) init_subtree_with_zero (child, gsi, insert_after); } /* Search for an access representative for the given expression EXPR and return it or NULL if it cannot be found. */ static struct access * get_access_for_expr (tree expr) { HOST_WIDE_INT offset, size, max_size; tree base; if (TREE_CODE (expr) == NOP_EXPR || TREE_CODE (expr) == VIEW_CONVERT_EXPR) expr = TREE_OPERAND (expr, 0); if (handled_component_p (expr)) { base = get_ref_base_and_extent (expr, &offset, &size, &max_size); size = max_size; if (size == -1 || !base || !DECL_P (base)) return NULL; } else if (DECL_P (expr)) { tree tree_size; base = expr; tree_size = TYPE_SIZE (TREE_TYPE (base)); if (tree_size && host_integerp (tree_size, 1)) size = max_size = tree_low_cst (tree_size, 1); else return NULL; offset = 0; } else return NULL; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; return get_var_base_offset_size_access (base, offset, size); } /* Substitute into *EXPR an expression of type TYPE with the value of the replacement of ACCESS. This is done either by producing a special V_C_E assignment statement converting the replacement to a new temporary of the requested type if TYPE is not TREE_ADDRESSABLE or by going through the base aggregate if it is. */ static void fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, gimple_stmt_iterator *gsi, bool write) { tree repl = get_access_replacement (access); if (!TREE_ADDRESSABLE (type)) { tree tmp = create_tmp_var (type, "SRvce"); add_referenced_var (tmp); if (is_gimple_reg_type (type)) tmp = make_ssa_name (tmp, NULL); if (write) { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); *expr = tmp; if (is_gimple_reg_type (type)) SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); stmt = gimple_build_assign (repl, conv); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); stmt = gimple_build_assign (tmp, conv); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); if (is_gimple_reg_type (type)) SSA_NAME_DEF_STMT (tmp) = stmt; *expr = tmp; update_stmt (stmt); } } else { if (write) { gimple stmt; stmt = gimple_build_assign (repl, unshare_expr (access->expr)); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; stmt = gimple_build_assign (unshare_expr (access->expr), repl); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } } } /* Callback for scan_function. Replace the expression EXPR with a scalar replacement if there is one and generate other statements to do type conversion or subtree copying if necessary. GSI is used to place newly created statements, WRITE is true if the expression is being written to (it is on a LHS of a statement or output in an assembly statement). */ static bool sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write, void *data ATTRIBUTE_UNUSED) { struct access *access; tree type, bfr; if (TREE_CODE (*expr) == BIT_FIELD_REF) { bfr = *expr; expr = &TREE_OPERAND (*expr, 0); } else bfr = NULL_TREE; if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR) expr = &TREE_OPERAND (*expr, 0); type = TREE_TYPE (*expr); access = get_access_for_expr (*expr); if (!access) return false; if (access->grp_to_be_replaced) { if (!useless_type_conversion_p (type, access->type)) fix_incompatible_types_for_expr (expr, type, access, gsi, write); else *expr = get_access_replacement (access); } if (access->first_child) { HOST_WIDE_INT start_offset, chunk_size; if (bfr && host_integerp (TREE_OPERAND (bfr, 1), 1) && host_integerp (TREE_OPERAND (bfr, 2), 1)) { start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1); chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1); } else start_offset = chunk_size = 0; generate_subtree_copies (access->first_child, access->base, 0, start_offset, chunk_size, gsi, write, write); } return true; } /* Store all replacements in the access tree rooted in TOP_RACC either to their base aggregate if there are unscalarized data or directly to LHS otherwise. */ static void handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs, gimple_stmt_iterator *gsi) { if (top_racc->grp_unscalarized_data) generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0, gsi, false, false); else generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset, 0, 0, gsi, false, false); } /* Try to generate statements to load all sub-replacements in an access (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC (sub)tree. If that is not possible, refresh the TOP_RACC base aggregate and load the accesses from it. LEFT_OFFSET is the offset of the left whole subtree being copied, RIGHT_OFFSET is the same thing for the right subtree. GSI is stmt iterator used for statement insertions. *REFRESHED is true iff the rhs top aggregate has already been refreshed by contents of its scalar reductions and is set to true if this function has to do it. */ static void load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc, HOST_WIDE_INT left_offset, HOST_WIDE_INT right_offset, gimple_stmt_iterator *old_gsi, gimple_stmt_iterator *new_gsi, bool *refreshed, tree lhs) { do { if (lacc->grp_to_be_replaced) { struct access *racc; HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset; racc = find_access_in_subtree (top_racc, offset, lacc->size); if (racc && racc->grp_to_be_replaced) { gimple stmt; if (useless_type_conversion_p (lacc->type, racc->type)) stmt = gimple_build_assign (get_access_replacement (lacc), get_access_replacement (racc)); else { tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type, get_access_replacement (racc)); stmt = gimple_build_assign (get_access_replacement (lacc), rhs); } gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { tree expr = unshare_expr (top_racc->base); bool repl_found; gimple stmt; /* No suitable access on the right hand side, need to load from the aggregate. See if we have to update it first... */ if (!*refreshed) { gcc_assert (top_racc->first_child); handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); *refreshed = true; } repl_found = build_ref_for_offset (&expr, TREE_TYPE (top_racc->base), lacc->offset - left_offset, lacc->type, false); gcc_assert (repl_found); stmt = gimple_build_assign (get_access_replacement (lacc), expr); gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } } else if (lacc->grp_read && !lacc->grp_covered && !*refreshed) { handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); *refreshed = true; } if (lacc->first_child) load_assign_lhs_subreplacements (lacc->first_child, top_racc, left_offset, right_offset, old_gsi, new_gsi, refreshed, lhs); lacc = lacc->next_sibling; } while (lacc); } /* Return true iff ACC is non-NULL and has subaccesses. */ static inline bool access_has_children_p (struct access *acc) { return acc && acc->first_child; } /* Modify assignments with a CONSTRUCTOR on their RHS. STMT contains a pointer to the assignment and GSI is the statement iterator pointing at it. Returns the same values as sra_modify_assign. */ static enum scan_assign_result sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi) { tree lhs = gimple_assign_lhs (*stmt); struct access *acc; gcc_assert (TREE_CODE (lhs) != REALPART_EXPR && TREE_CODE (lhs) != IMAGPART_EXPR); acc = get_access_for_expr (lhs); if (!acc) return SRA_SA_NONE; if (VEC_length (constructor_elt, CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0) { /* I have never seen this code path trigger but if it can happen the following should handle it gracefully. */ if (access_has_children_p (acc)) generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi, true, true); return SRA_SA_PROCESSED; } if (acc->grp_covered) { init_subtree_with_zero (acc, gsi, false); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); return SRA_SA_REMOVED; } else { init_subtree_with_zero (acc, gsi, true); return SRA_SA_PROCESSED; } } /* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with to-be-scalarized expressions with them. STMT is the statement and GSI is the iterator used to place new helper statements. Returns the same values as sra_modify_assign. */ static enum scan_assign_result sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi) { tree lhs, complex, ptype, rp, ip; struct access *access; gimple new_stmt, aux_stmt; lhs = gimple_assign_lhs (stmt); complex = TREE_OPERAND (lhs, 0); access = get_access_for_expr (complex); if (!access || !access->grp_to_be_replaced) return SRA_SA_NONE; ptype = TREE_TYPE (TREE_TYPE (complex)); rp = create_tmp_var (ptype, "SRr"); add_referenced_var (rp); rp = make_ssa_name (rp, NULL); ip = create_tmp_var (ptype, "SRp"); add_referenced_var (ip); ip = make_ssa_name (ip, NULL); if (TREE_CODE (lhs) == IMAGPART_EXPR) { aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype, get_access_replacement (access))); SSA_NAME_DEF_STMT (rp) = aux_stmt; gimple_assign_set_lhs (stmt, ip); SSA_NAME_DEF_STMT (ip) = stmt; } else { aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype, get_access_replacement (access))); SSA_NAME_DEF_STMT (ip) = aux_stmt; gimple_assign_set_lhs (stmt, rp); SSA_NAME_DEF_STMT (rp) = stmt; } gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT); update_stmt (aux_stmt); new_stmt = gimple_build_assign (get_access_replacement (access), fold_build2 (COMPLEX_EXPR, access->type, rp, ip)); gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); update_stmt (new_stmt); return SRA_SA_PROCESSED; } /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components. */ static bool contains_view_convert_expr_p (tree t) { while (1) { if (TREE_CODE (t) == VIEW_CONVERT_EXPR) return true; if (!handled_component_p (t)) return false; t = TREE_OPERAND (t, 0); } } /* Change STMT to assign compatible types by means of adding component or array references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as variable with the same names in sra_modify_assign. This is done in a such a complicated way in order to make testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some cases. */ static void fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt, struct access *lacc, struct access *racc, tree lhs, tree *rhs, tree ltype, tree rtype) { if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) && !access_has_children_p (lacc)) { tree expr = unshare_expr (lhs); bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, false); if (found) { gimple_assign_set_lhs (*stmt, expr); return; } } if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) && !access_has_children_p (racc)) { tree expr = unshare_expr (*rhs); bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, false); if (found) { gimple_assign_set_rhs1 (*stmt, expr); return; } } *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); gimple_assign_set_rhs_from_tree (gsi, *rhs); *stmt = gsi_stmt (*gsi); } /* Callback of scan_function to process assign statements. It examines both sides of the statement, replaces them with a scalare replacement if there is one and generating copying of replacements if scalarized aggregates have been used in the assignment. STMT is a pointer to the assign statement, GSI is used to hold generated statements for type conversions and subtree copying. */ static enum scan_assign_result sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, void *data ATTRIBUTE_UNUSED) { struct access *lacc, *racc; tree ltype, rtype; tree lhs, rhs; bool modify_this_stmt; if (gimple_assign_rhs2 (*stmt)) return SRA_SA_NONE; lhs = gimple_assign_lhs (*stmt); rhs = gimple_assign_rhs1 (*stmt); if (TREE_CODE (rhs) == CONSTRUCTOR) return sra_modify_constructor_assign (stmt, gsi); if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) return sra_modify_partially_complex_lhs (*stmt, gsi); if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) { modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), gsi, false, data); modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), gsi, true, data); return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } lacc = get_access_for_expr (lhs); racc = get_access_for_expr (rhs); if (!lacc && !racc) return SRA_SA_NONE; modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) || (racc && racc->grp_to_be_replaced)); if (lacc && lacc->grp_to_be_replaced) { lhs = get_access_replacement (lacc); gimple_assign_set_lhs (*stmt, lhs); ltype = lacc->type; } else ltype = TREE_TYPE (lhs); if (racc && racc->grp_to_be_replaced) { rhs = get_access_replacement (racc); gimple_assign_set_rhs1 (*stmt, rhs); rtype = racc->type; } else rtype = TREE_TYPE (rhs); /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate the statement makes the position of this pop_stmt_changes() a bit awkward but hopefully make some sense. */ if (modify_this_stmt) { if (!useless_type_conversion_p (ltype, rtype)) fix_modified_assign_compatibility (gsi, stmt, lacc, racc, lhs, &rhs, ltype, rtype); } if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) || (access_has_children_p (racc) && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) || (access_has_children_p (lacc) && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) { if (access_has_children_p (racc)) generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0, gsi, false, false); if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0, gsi, true, true); } else { if (access_has_children_p (lacc) && access_has_children_p (racc)) { gimple_stmt_iterator orig_gsi = *gsi; bool refreshed; if (lacc->grp_read && !lacc->grp_covered) { handle_unscalarized_data_in_subtree (racc, lhs, gsi); refreshed = true; } else refreshed = false; load_assign_lhs_subreplacements (lacc->first_child, racc, lacc->offset, racc->offset, &orig_gsi, gsi, &refreshed, lhs); if (!refreshed || !racc->grp_unscalarized_data) { if (*stmt == gsi_stmt (*gsi)) gsi_next (gsi); unlink_stmt_vdef (*stmt); gsi_remove (&orig_gsi, true); return SRA_SA_REMOVED; } } else { if (access_has_children_p (racc)) { if (!racc->grp_unscalarized_data) { generate_subtree_copies (racc->first_child, lhs, racc->offset, 0, 0, gsi, false, false); gcc_assert (*stmt == gsi_stmt (*gsi)); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); return SRA_SA_REMOVED; } else generate_subtree_copies (racc->first_child, lhs, racc->offset, 0, 0, gsi, false, true); } else if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, rhs, lacc->offset, 0, 0, gsi, true, true); } } return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } /* Generate statements initializing scalar replacements of parts of function parameters. */ static void initialize_parameter_reductions (void) { gimple_stmt_iterator gsi; gimple_seq seq = NULL; tree parm; for (parm = DECL_ARGUMENTS (current_function_decl); parm; parm = TREE_CHAIN (parm)) { VEC (access_p, heap) *access_vec; struct access *access; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm))) continue; access_vec = get_base_access_vector (parm); if (!access_vec) continue; if (!seq) { seq = gimple_seq_alloc (); gsi = gsi_start (seq); } for (access = VEC_index (access_p, access_vec, 0); access; access = access->next_grp) generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true); } if (seq) gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq); } /* The "main" function of intraprocedural SRA passes. Runs the analysis and if it reveals there are components of some aggregates to be scalarized, it runs the required transformations. */ static unsigned int perform_intra_sra (void) { int ret = 0; sra_initialize (); if (!find_var_candidates ()) goto out; if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL, true, NULL)) goto out; if (!analyze_all_variable_accesses ()) goto out; scan_function (sra_modify_expr, sra_modify_assign, NULL, false, NULL); initialize_parameter_reductions (); ret = TODO_update_ssa; if (sra_mode == SRA_MODE_EARLY_INTRA) ret = TODO_update_ssa; else ret = TODO_update_ssa | TODO_rebuild_alias; out: sra_deinitialize (); return ret; } /* Perform early intraprocedural SRA. */ static unsigned int early_intra_sra (void) { sra_mode = SRA_MODE_EARLY_INTRA; return perform_intra_sra (); } /* Perform "late" intraprocedural SRA. */ static unsigned int late_intra_sra (void) { sra_mode = SRA_MODE_INTRA; return perform_intra_sra (); } static bool gate_intra_sra (void) { return flag_tree_sra != 0; } struct gimple_opt_pass pass_sra_early = { { GIMPLE_PASS, "esra", /* name */ gate_intra_sra, /* gate */ early_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa | TODO_ggc_collect | TODO_verify_ssa /* todo_flags_finish */ } }; struct gimple_opt_pass pass_sra = { { GIMPLE_PASS, "sra", /* name */ gate_intra_sra, /* gate */ late_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ TODO_update_address_taken, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa | TODO_ggc_collect | TODO_verify_ssa /* todo_flags_finish */ } }; ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-04-28 10:27 ` Martin Jambor @ 2009-04-29 12:56 ` Richard Guenther 2009-05-10 10:33 ` Martin Jambor 2009-05-10 10:39 ` Martin Jambor 0 siblings, 2 replies; 25+ messages in thread From: Richard Guenther @ 2009-04-29 12:56 UTC (permalink / raw) To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka On Tue, 28 Apr 2009, Martin Jambor wrote: > On Tue, Apr 28, 2009 at 12:04:32PM +0200, Martin Jambor wrote: > > This is the new intraprocedural SRA. I have stripped off the > > interprocedural part and will propose to commit it separately later. > > I have tried to remove almost every trace of IPA-SRA, however, two > > provisions for it have remained in the patch. First, an enumeration > > (rather than a boolean) is used to distuinguish between "early" and > > "late" SRA so that other SRA modes can be added later on. Second, > > scan_function() has a hook parameter and a void pointer parameter > > which are not used in this patch but will be by IPA-SRA. > > > > Otherwise, the patch is hopefully self-contained and the bases of its > > operation is described by the initial comment. > > > > The patch bootstraps (on x86_64-linux-gnu but I am about to try it on > > hppa-linux-gnu too) but produces a small number of testsuite failures > > which are handled by the two following patches. > > > > Thanks, > > > > Martin > > > > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > > > * tree-sra.c (enum sra_mode): The whole contents of the file was > > replaced. > > Hm, the patch is quite unreadable, below is the new tree-sra.c file > which entirely replaces the old one (note that the patch also modifies > the Makefile though): Ah. Here it is ... the comment to the changelog still applies. > /* Scalar Replacement of Aggregates (SRA) converts some structure > references into scalar references, exposing them to the scalar > optimizers. > Copyright (C) 2008, 2009 Free Software Foundation, Inc. > Contributed by Martin Jambor <mjambor@suse.cz> > > This file is part of GCC. > > GCC is free software; you can redistribute it and/or modify it under > the terms of the GNU General Public License as published by the Free > Software Foundation; either version 3, or (at your option) any later > version. > > GCC is distributed in the hope that it will be useful, but WITHOUT ANY > WARRANTY; without even the implied warranty of MERCHANTABILITY or > FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > for more details. > > You should have received a copy of the GNU General Public License > along with GCC; see the file COPYING3. If not see > <http://www.gnu.org/licenses/>. */ > > /* This file implements Scalar Reduction of Aggregates (SRA). SRA is run > twice, once in the early stages of compilation (early SRA) and once in the > late stages (late SRA). The aim of both is to turn references to scalar > parts of aggregates into uses of independent scalar variables. > > The two passes are nearly identical, the only difference is that early SRA > does not scalarize unions which are used as the result in a GIMPLE_RETURN > statement because together with inlining this can lead to weird type > conversions. Do you happen to have a testcase for this or can you describe that problem some more? > Both passes operate in four stages: > > 1. The declarations that have properties which make them candidates for > scalarization are identified in function find_var_candidates(). The > candidates are stored in candidate_bitmap. > > 2. The function body is scanned. In the process, declarations which are > used in a manner that prevent their scalarization are removed from the > candidate bitmap. More importantly, for every access into an aggregate, > an access structure (struct access) is created by create_access() and > stored in a vector associated with the aggregate. Among other > information, the aggregate declaration, the offset and size of the access > and its type are stored in the structure. > > On a related note, assign_link structures are created for every assign > statement between candidate aggregates and attached to the related > accesses. > > 3. The vectors of accesses are analyzed. They are first sorted according to > their offset and size and then scanned for partially overlapping accesses > (i.e. those which overlap but one is not entirely within another). Such > an access disqualifies the whole aggregate from being scalarized. This happens only when get_ref_base_and_extent punts and returns -1 for the access size? And of course with struct field accesses of different structs that are inside the same union. > If there is no such inhibiting overlap, a representative access structure > is chosen for every unique combination of offset and size. Afterwards, > the pass builds a set of trees from these structures, in which children > of an access are within their parent (in terms of offset and size). > > Then accesses are propagated whenever possible (i.e. in cases when it > does not create a partially overlapping access) across assign_links from > the right hand side to the left hand side. > > Then the set of trees for each declaration is traversed again and those > accesses which should be replaced by a scalar are identified. > > 4. The function is traversed again, and for every reference into an > aggregate that has some component which is about to be scalarized, > statements are amended and new statements are created as necessary. > Finally, if a parameter got scalarized, the scalar replacements are > initialized with values from respective parameter aggregates. > */ Closing */ goes to the previous line. > > #include "config.h" > #include "system.h" > #include "coretypes.h" > #include "alloc-pool.h" > #include "tm.h" > #include "tree.h" > #include "gimple.h" > #include "tree-flow.h" > #include "diagnostic.h" > #include "tree-dump.h" > #include "timevar.h" > #include "params.h" > #include "target.h" > #include "flags.h" > > /* Enumeration of all aggregate reductions we can do. */ > enum sra_mode {SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ > SRA_MODE_INTRA}; /* late intraprocedural SRA */ Spaces after { and before }. > > /* Global variable describing which aggregate reduction we are performing at > the moment. */ > static enum sra_mode sra_mode; > > struct assign_link; > > /* ACCESS represents each access to an aggregate variable (as a whole or a > part). It can also represent a group of accesses that refer to exactly the > same fragment of an aggregate (i.e. those that have exactly the same offset > and size). Such representatives for a single aggregate, once determined, > are linked in a linked list and have the group fields set. > > Moreover, when doing intraprocedural SRA, a tree is built from those > representatives (by the means of first_child and next_sibling pointers), in > which all items in a subtree are "within" the root, i.e. their offset is > greater or equal to offset of the root and offset+size is smaller or equal > to offset+size of the root. Children of an access are sorted by offset. > */ */ to previous line. > struct access > { > /* Values returned by `get_ref_base_and_extent' for each COMPONENT_REF > If EXPR isn't a COMPONENT_REF just set `BASE = EXPR', `OFFSET = 0', > `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ s/COMPONENT_REF/component reference/g - it's not only COMPONENT_REF trees we handle. > HOST_WIDE_INT offset; > HOST_WIDE_INT size; > tree base; > > /* Expression. */ > tree expr; > /* Type. */ > tree type; > > /* Next group representative for this aggregate. */ > struct access *next_grp; > > /* Pointer to the group representative. Pointer to itself if the struct is > the representative. */ > struct access *group_representative; > > /* If this access has any children (in terms of the definition above), this > points to the first one. */ > struct access *first_child; > > /* Pointer to the next sibling in the access tree as described above. */ > struct access *next_sibling; > > /* Pointers to the first and last element in the linked list of assign > links. */ > struct assign_link *first_link, *last_link; vertical space missing. > /* Pointer to the next access in the work queue. */ > struct access *next_queued; > > /* Replacement variable for this access "region." Never to be accessed > directly, always only by the means of get_access_replacement() and only > when grp_to_be_replaced flag is set. */ > tree replacement_decl; > > /* Is this particular access write access? */ > unsigned write : 1; > > /* Is this access currently in the work queue? */ > unsigned grp_queued : 1; > /* Does this group contain a write access? This flag is propagated down the > access tree. */ > unsigned grp_write : 1; > /* Does this group contain a read access? This flag is propagated down the > access tree. */ > unsigned grp_read : 1; > /* Is the subtree rooted in this access fully covered by scalar > replacements? */ > unsigned grp_covered : 1; > /* If set to true, this access and all below it in an access tree must not be > scalarized. */ > unsigned grp_unscalarizable_region : 1; > /* Whether data have been written to parts of the aggregate covered by this > access which is not to be scalarized. This flag is propagated up in the > access tree. */ > unsigned grp_unscalarized_data : 1; > /* Does this access and/or group contain a write access through a > BIT_FIELD_REF? */ > unsigned grp_bfr_lhs : 1; > > /* Set when a scalar replacement should be created for this variable. We do > the decision and creation at different places because create_tmp_var > cannot be called from within FOR_EACH_REFERENCED_VAR. */ > unsigned grp_to_be_replaced : 1; > }; > > typedef struct access *access_p; > > DEF_VEC_P (access_p); > DEF_VEC_ALLOC_P (access_p, heap); > > /* Alloc pool for allocating access structures. */ > static alloc_pool access_pool; > > /* A structure linking lhs and rhs accesses from an aggregate assignment. They > are used to propagate subaccesses from rhs to lhs as long as they don't > conflict with what is already there. */ > struct assign_link > { > struct access *lacc, *racc; > struct assign_link *next; > }; > > /* Alloc pool for allocating assign link structures. */ > static alloc_pool link_pool; > > /* Base (tree) -> Vector (VEC(access_p,heap) *) map. */ > static struct pointer_map_t *base_access_vec; > > /* Bitmap of bases (candidates). */ > static bitmap candidate_bitmap; > /* Bitmap of declarations used in a return statement. */ > static bitmap retvals_bitmap; > /* Obstack for creation of fancy names. */ > static struct obstack name_obstack; > > /* Head of a linked list of accesses that need to have its subaccesses > propagated to their assignment counterparts. */ > static struct access *work_queue_head; > > /* Dump contents of ACCESS to file F in a human friendly way. If GRP is true, > representative fields are dumped, otherwise those which only describe the > individual access are. */ > > static void > dump_access (FILE *f, struct access *access, bool grp) > { > fprintf (f, "access { "); > fprintf (f, "base = (%d)'", DECL_UID (access->base)); > print_generic_expr (f, access->base, 0); > fprintf (f, "', offset = %d", (int) access->offset); > fprintf (f, ", size = %d", (int) access->size); you can use ", offset = "HOST_WIDE_INT_PRINT_DEC, access->offset here. > fprintf (f, ", expr = "); > print_generic_expr (f, access->expr, 0); > fprintf (f, ", type = "); > print_generic_expr (f, access->type, 0); > if (grp) > fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, " > "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " > "grp_to_be_replaced = %d\n", > access->grp_write, access->grp_read, access->grp_covered, > access->grp_unscalarizable_region, access->grp_unscalarized_data, > access->grp_to_be_replaced); > else > fprintf (f, ", write = %d'\n", access->write); > } > > /* Dump a subtree rooted in ACCESS to file F, indent by LEVEL. */ > > static void > dump_access_tree_1 (FILE *f, struct access *access, int level) > { > do > { > int i; > > for (i = 0; i < level; i++) > fputs ("* ", dump_file); > > dump_access (f, access, true); > > if (access->first_child) > dump_access_tree_1 (f, access->first_child, level + 1); > > access = access->next_sibling; > } > while (access); > } > > /* Dump all access trees for a variable, given the pointer to the first root in > ACCESS. */ > > static void > dump_access_tree (FILE *f, struct access *access) > { > for (; access; access = access->next_grp) > dump_access_tree_1 (f, access, 0); > } > > /* Return a vector of pointers to accesses for the variable given in BASE or > NULL if there is none. */ > > static VEC (access_p, heap) * > get_base_access_vector (tree base) > { > void **slot; > > slot = pointer_map_contains (base_access_vec, base); > if (!slot) > return NULL; > else > return *(VEC (access_p, heap) **) slot; > } > > /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted > in ACCESS. Return NULL if it cannot be found. */ > > static struct access * > find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, > HOST_WIDE_INT size) > { > while (access && (access->offset != offset || access->size != size)) > { > struct access *child = access->first_child; > > while (child && (child->offset + child->size <= offset)) > child = child->next_sibling; Do you limit the number of siblings? We should keep an eye on this for potential compile-time issues (not that I expect some). > access = child; > } > > return access; > } > > /* Return the first group representative for DECL or NULL if none exists. */ > > static struct access * > get_first_repr_for_decl (tree base) > { > VEC (access_p, heap) *access_vec; > > access_vec = get_base_access_vector (base); > if (!access_vec) > return NULL; > > return VEC_index (access_p, access_vec, 0); > } > > /* Find an access representative for the variable BASE and given OFFSET and > SIZE. Requires that access trees have already been built. Return NULL if > it cannot be found. */ > > static struct access * > get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, > HOST_WIDE_INT size) > { > struct access *access; > > access = get_first_repr_for_decl (base); > while (access && (access->offset + access->size <= offset)) > access = access->next_grp; > if (!access) > return NULL; > > return find_access_in_subtree (access, offset, size); > } > > /* Add LINK to the linked list of assign links of RACC. */ > static void > add_link_to_rhs (struct access *racc, struct assign_link *link) > { > gcc_assert (link->racc == racc); > > if (!racc->first_link) > { > gcc_assert (!racc->last_link); > racc->first_link = link; > } > else > racc->last_link->next = link; > > racc->last_link = link; > link->next = NULL; > } > > /* Move all link structures in their linked list in OLD_RACC to the linked list > in NEW_RACC. */ > static void > relink_to_new_repr (struct access *new_racc, struct access *old_racc) > { > if (!old_racc->first_link) > { > gcc_assert (!old_racc->last_link); > return; > } > > if (new_racc->first_link) > { > gcc_assert (!new_racc->last_link->next); > gcc_assert (!old_racc->last_link || !old_racc->last_link->next); > > new_racc->last_link->next = old_racc->first_link; > new_racc->last_link = old_racc->last_link; > } > else > { > gcc_assert (!new_racc->last_link); > > new_racc->first_link = old_racc->first_link; > new_racc->last_link = old_racc->last_link; > } > old_racc->first_link = old_racc->last_link = NULL; > } > > /* Add ACCESS to the work queue (which is actually a stack). */ > > static void > add_access_to_work_queue (struct access *access) > { > if (!access->grp_queued) > { > gcc_assert (!access->next_queued); > access->next_queued = work_queue_head; > access->grp_queued = 1; > work_queue_head = access; > } > } > > /* Pop an access from the work queue, and return it, assuming there is one. */ > > static struct access * > pop_access_from_work_queue (void) > { > struct access *access = work_queue_head; > > work_queue_head = access->next_queued; > access->next_queued = NULL; > access->grp_queued = 0; > return access; > } > > > /* Allocate necessary structures. */ > > static void > sra_initialize (void) > { > candidate_bitmap = BITMAP_ALLOC (NULL); > retvals_bitmap = BITMAP_ALLOC (NULL); > gcc_obstack_init (&name_obstack); > access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16); > link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16); > base_access_vec = pointer_map_create (); > } > > /* Hook fed to pointer_map_traverse, deallocate stored vectors. */ > > static bool > delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value, > void *data ATTRIBUTE_UNUSED) > { > VEC (access_p, heap) *access_vec; > access_vec = (VEC (access_p, heap) *) *value; > VEC_free (access_p, heap, access_vec); > > return true; > } > > /* Deallocate all general structures. */ > > static void > sra_deinitialize (void) > { > BITMAP_FREE (candidate_bitmap); > BITMAP_FREE (retvals_bitmap); > free_alloc_pool (access_pool); > free_alloc_pool (link_pool); > obstack_free (&name_obstack, NULL); > > pointer_map_traverse (base_access_vec, delete_base_accesses, NULL); > pointer_map_destroy (base_access_vec); > } > > /* Remove DECL from candidates for SRA and write REASON to the dump file if > there is one. */ > static void > disqualify_candidate (tree decl, const char *reason) > { > bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)); > > if (dump_file) && (dump_flags & TDF_DETAILS) > { > fprintf (dump_file, "! Disqualifying "); > print_generic_expr (dump_file, decl, 0); > fprintf (dump_file, " - %s\n", reason); > } > } > > /* Return true iff the type contains a field or an element which does not allow > scalarization. */ > > static bool > type_internals_preclude_sra_p (tree type) > { > tree fld; > tree et; > > switch (TREE_CODE (type)) > { > case RECORD_TYPE: > case UNION_TYPE: > case QUAL_UNION_TYPE: > for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) > if (TREE_CODE (fld) == FIELD_DECL) > { > tree ft = TREE_TYPE (fld); > > if (TREE_THIS_VOLATILE (fld) > || !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld) > || !host_integerp (DECL_FIELD_OFFSET (fld), 1) > || !host_integerp (DECL_SIZE (fld), 1)) > return true; > > if (AGGREGATE_TYPE_P (ft) > && type_internals_preclude_sra_p (ft)) > return true; > } > > return false; > > case ARRAY_TYPE: > et = TREE_TYPE (type); > > if (AGGREGATE_TYPE_P (et)) > return type_internals_preclude_sra_p (et); > else > return false; > > default: > return false; > } > } > > /* Create and insert access for EXPR. Return created access, or NULL if it is > not possible. */ > > static struct access * > create_access (tree expr, bool write) > { > struct access *access; > void **slot; > VEC (access_p,heap) *vec; > HOST_WIDE_INT offset, size, max_size; > tree base = expr; > bool unscalarizable_region = false; > > if (handled_component_p (expr)) > base = get_ref_base_and_extent (expr, &offset, &size, &max_size); > else > { > tree tree_size; > > tree_size = TYPE_SIZE (TREE_TYPE (base)); > if (tree_size && host_integerp (tree_size, 1)) > size = max_size = tree_low_cst (tree_size, 1); > else > size = max_size = -1; > > offset = 0; > } get_ref_base_and_extent should now also work on plain DECLs (non-handled_component_p) and base should never be NULL. > if (!base || !DECL_P (base) > || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) > return NULL; > > if (size != max_size) > { > size = max_size; > unscalarizable_region = true; > } > > if (size < 0) > { > disqualify_candidate (base, "Encountered an ultra variable sized " > "access."); ultra variable sized? I would name it 'unconstrained access'. Note that there is still useful information in this case if offset is non-zero (namely accesses before [offset, -1] may still be scalarized). Maybe something for further improvements. This for example would happen for structs with trailing arrays. > return NULL; > } > > access = (struct access *) pool_alloc (access_pool); > memset (access, 0, sizeof (struct access)); > > access->base = base; > access->offset = offset; > access->size = size; > access->expr = expr; > access->type = TREE_TYPE (expr); > access->write = write; > access->grp_unscalarizable_region = unscalarizable_region; > > slot = pointer_map_contains (base_access_vec, base); > if (slot) > vec = (VEC (access_p, heap) *) *slot; > else > vec = VEC_alloc (access_p, heap, 32); > > VEC_safe_push (access_p, heap, vec, access); > > *((struct VEC (access_p,heap) **) > pointer_map_insert (base_access_vec, base)) = vec; > > return access; > } > > > /* Callback of walk_tree. Search the given tree for a declaration and exclude > it from the candidates. */ > > static tree > disqualify_all (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) > { > tree base = *tp; > > > if (TREE_CODE (base) == SSA_NAME) > base = SSA_NAME_VAR (base); Err ... for SSA_NAME bases there is nothing to scalarize? So just bail out in that case? In fact, using walk_tree for disqualify_all looks a bit expensive (it also walks types). > if (DECL_P (base)) > { > disqualify_candidate (base, "From within disqualify_all()."); > *walk_subtrees = 0; > } > else > *walk_subtrees = 1; > > > return NULL_TREE; > } > > /* Scan expression EXPR and create access structures for all accesses to > candidates for scalarization. Return the created access or NULL if none is > created. */ > > static struct access * > build_access_from_expr_1 (tree *expr_ptr, > gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write) > { > struct access *ret = NULL; > tree expr = *expr_ptr; > tree safe_expr = expr; > bool bit_ref; > > if (TREE_CODE (expr) == BIT_FIELD_REF) > { > expr = TREE_OPERAND (expr, 0); > bit_ref = true; > } > else > bit_ref = false; > > while (TREE_CODE (expr) == NOP_EXPR CONVERT_EXPR_P (expr) > || TREE_CODE (expr) == VIEW_CONVERT_EXPR > || TREE_CODE (expr) == REALPART_EXPR > || TREE_CODE (expr) == IMAGPART_EXPR) > expr = TREE_OPERAND (expr, 0); Why do this here btw, and not just lump ... > switch (TREE_CODE (expr)) > { > case ADDR_EXPR: > case SSA_NAME: > case INDIRECT_REF: > break; > > case VAR_DECL: > case PARM_DECL: > case RESULT_DECL: > case COMPONENT_REF: > case ARRAY_REF: > ret = create_access (expr, write); > break; ... this ... > case REALPART_EXPR: > case IMAGPART_EXPR: > expr = TREE_OPERAND (expr, 0); > ret = create_access (expr, write); ... and this together? Won't you create bogus accesses if you strip for example IMAGPART_EXPR (which has non-zero offset)? > break; > > case ARRAY_RANGE_REF: it should just be handled fine I think. > default: > walk_tree (&safe_expr, disqualify_all, NULL, NULL); and if not, this should just disqualify the base of the access, like get_base_address (safe_expr) (save_expr you mean?) and then if that is a DECL, disqualify that decl. > break; > } > > if (write && bit_ref && ret) > ret->grp_bfr_lhs = 1; > > return ret; > } > > /* Scan expression EXPR and create access structures for all accesses to > candidates for scalarization. Return true if any access has been > inserted. */ > > static bool > build_access_from_expr (tree *expr_ptr, > gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write, > void *data ATTRIBUTE_UNUSED) > { > return build_access_from_expr_1 (expr_ptr, gsi, write) != NULL; > } > > /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in > modes in which it matters, return true iff they have been disqualified. RHS > may be NULL, in that case ignore it. If we scalarize an aggregate in > intra-SRA we may need to add statements after each statement. This is not > possible if a statement unconditionally has to end the basic block. */ > static bool > disqualify_ops_if_throwing_stmt (gimple stmt, tree *lhs, tree *rhs) > { > if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)) > { > walk_tree (lhs, disqualify_all, NULL, NULL); > if (rhs) > walk_tree (rhs, disqualify_all, NULL, NULL); > return true; > } > return false; > } > > > /* Result code for scan_assign callback for scan_function. */ > enum scan_assign_result {SRA_SA_NONE, /* nothing done for the stmt */ > SRA_SA_PROCESSED, /* stmt analyzed/changed */ > SRA_SA_REMOVED}; /* stmt redundant and eliminated */ space after { and before }. > > > /* Scan expressions occuring in the statement pointed to by STMT_EXPR, create > access structures for all accesses to candidates for scalarization and > remove those candidates which occur in statements or expressions that > prevent them from being split apart. Return true if any access has been > inserted. */ > > static enum scan_assign_result > build_accesses_from_assign (gimple *stmt_ptr, > gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, > void *data ATTRIBUTE_UNUSED) > { > gimple stmt = *stmt_ptr; > tree *lhs_ptr, *rhs_ptr; > struct access *lacc, *racc; > > if (gimple_assign_rhs2 (stmt)) !gimple_assign_single_p (stmt) > return SRA_SA_NONE; > > lhs_ptr = gimple_assign_lhs_ptr (stmt); > rhs_ptr = gimple_assign_rhs1_ptr (stmt); you probably don't need to pass pointers to trees everywhere as you are not changing them. > if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr)) > return SRA_SA_NONE; > > racc = build_access_from_expr_1 (rhs_ptr, gsi, false); > lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); just avoid calling into build_access_from_expr_1 for SSA_NAMEs or is_gimple_min_invariant lhs/rhs, that should make that function more regular. > if (lacc && racc > && !lacc->grp_unscalarizable_region > && !racc->grp_unscalarizable_region > && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) > && lacc->size <= racc->size > && useless_type_conversion_p (lacc->type, racc->type)) useless_type_conversion_p should be always true here. > { > struct assign_link *link; > > link = (struct assign_link *) pool_alloc (link_pool); > memset (link, 0, sizeof (struct assign_link)); > > link->lacc = lacc; > link->racc = racc; > > add_link_to_rhs (racc, link); > } > > return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE; > } > > /* Scan function and look for interesting statements. Return true if any has > been found or processed, as indicated by callbacks. SCAN_EXPR is a callback > called on all expressions within statements except assign statements and > those deemed entirely unsuitable for some reason (all operands in such > statements and expression are removed from candidate_bitmap). SCAN_ASSIGN > is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback > called on assign statements and those call statements which have a lhs and > it is the only callback which can be NULL. ANALYSIS_STAGE is true when > running in the analysis stage of a pass and thus no statement is being > modified. DATA is a pointer passed to all callbacks. If any single > callback returns true, this function also returns true, otherwise it returns > false. */ > > static bool > scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *), > enum scan_assign_result (*scan_assign) (gimple *, > gimple_stmt_iterator *, > void *), > bool (*handle_ssa_defs)(gimple, void *), > bool analysis_stage, void *data) > { > gimple_stmt_iterator gsi; > basic_block bb; > unsigned i; > tree *t; > bool ret = false; > > FOR_EACH_BB (bb) > { > bool bb_changed = false; > > gsi = gsi_start_bb (bb); > while (!gsi_end_p (gsi)) > { > gimple stmt = gsi_stmt (gsi); > enum scan_assign_result assign_result; > bool any = false, deleted = false; > > switch (gimple_code (stmt)) > { > case GIMPLE_RETURN: > t = gimple_return_retval_ptr (stmt); > if (*t != NULL_TREE) > { > if (DECL_P (*t)) > { > tree ret_type = TREE_TYPE (*t); > if (sra_mode == SRA_MODE_EARLY_INTRA > && (TREE_CODE (ret_type) == UNION_TYPE > || TREE_CODE (ret_type) == QUAL_UNION_TYPE)) > disqualify_candidate (*t, > "Union in a return statement."); > else > bitmap_set_bit (retvals_bitmap, DECL_UID (*t)); > } > any |= scan_expr (t, &gsi, false, data); > } Likewise for passing pointers (why is gsi necessary and passing a stmt does not work?) > break; > > case GIMPLE_ASSIGN: > assign_result = scan_assign (&stmt, &gsi, data); > any |= assign_result == SRA_SA_PROCESSED; > deleted = assign_result == SRA_SA_REMOVED; > if (handle_ssa_defs && assign_result != SRA_SA_REMOVED) > any |= handle_ssa_defs (stmt, data); > break; > > case GIMPLE_CALL: > /* Operands must be processed before the lhs. */ > for (i = 0; i < gimple_call_num_args (stmt); i++) > { > tree *argp = gimple_call_arg_ptr (stmt, i); > any |= scan_expr (argp, &gsi, false, data); > } > > if (gimple_call_lhs (stmt)) > { > tree *lhs_ptr = gimple_call_lhs_ptr (stmt); > if (!analysis_stage || > !disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, NULL)) > { > any |= scan_expr (lhs_ptr, &gsi, true, data); > if (handle_ssa_defs) > any |= handle_ssa_defs (stmt, data); > } > } > break; > > case GIMPLE_ASM: > for (i = 0; i < gimple_asm_ninputs (stmt); i++) > { > tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i)); > any |= scan_expr (op, &gsi, false, data); > } > for (i = 0; i < gimple_asm_noutputs (stmt); i++) > { > tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i)); > any |= scan_expr (op, &gsi, true, data); > } asm operands with memory constraints should be disqualified from SRA (see walk_stmt_load_store_addr_ops and/or the operand scanner). > default: > if (analysis_stage) > walk_gimple_op (stmt, disqualify_all, NULL); You seem to be very eager to disqualify anything unknown ;) (But I yet have to come across a TREE_ADDRESSABLE check ...) > break; > } > > if (any) > { > ret = true; > bb_changed = true; > > if (!analysis_stage) Oh. So we reuse this function. Hmm. > { > update_stmt (stmt); > if (!stmt_could_throw_p (stmt)) > remove_stmt_from_eh_region (stmt); Usually if (maybe_clean_or_replace_eh_stmt (stmt, stmt) && gimple_purge_dead_eh_edges (gimple_bb (stmt))) is the pattern for this. But then you disqualified all throwing expressions, no? > } > } > if (deleted) > bb_changed = true; > else > { > gsi_next (&gsi); > ret = true; > } > } > if (!analysis_stage && bb_changed) > gimple_purge_dead_eh_edges (bb); > } > > return ret; > } > > /* Helper of QSORT function. There are pointers to accesses in the array. An > access is considered smaller than another if it has smaller offset or if the > offsets are the same but is size is bigger. */ > > static int > compare_access_positions (const void *a, const void *b) > { > const access_p *fp1 = (const access_p *) a; > const access_p *fp2 = (const access_p *) b; > const access_p f1 = *fp1; > const access_p f2 = *fp2; > > if (f1->offset != f2->offset) > return f1->offset < f2->offset ? -1 : 1; > > if (f1->size == f2->size) > return 0; > /* We want the bigger accesses first, thus the opposite operator in the next > line: */ > return f1->size > f2->size ? -1 : 1; > } > > > /* Append a name of the declaration to the name obstack. A helper function for > make_fancy_name. */ > > static void > make_fancy_decl_name (tree decl) > { > char buffer[32]; > > tree name = DECL_NAME (decl); > if (name) > obstack_grow (&name_obstack, IDENTIFIER_POINTER (name), > IDENTIFIER_LENGTH (name)); > else > { > sprintf (buffer, "D%u", DECL_UID (decl)); That would just be useless information. I guess you copied this from old SRA? > obstack_grow (&name_obstack, buffer, strlen (buffer)); > } > } > > /* Helper for make_fancy_name. */ > > static void > make_fancy_name_1 (tree expr) > { > char buffer[32]; > tree index; > > if (DECL_P (expr)) > { > make_fancy_decl_name (expr); > return; > } > > switch (TREE_CODE (expr)) > { > case COMPONENT_REF: > make_fancy_name_1 (TREE_OPERAND (expr, 0)); > obstack_1grow (&name_obstack, '$'); > make_fancy_decl_name (TREE_OPERAND (expr, 1)); > break; > > case ARRAY_REF: > make_fancy_name_1 (TREE_OPERAND (expr, 0)); > obstack_1grow (&name_obstack, '$'); > /* Arrays with only one element may not have a constant as their > index. */ > index = TREE_OPERAND (expr, 1); > if (TREE_CODE (index) != INTEGER_CST) > break; > sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index)); > obstack_grow (&name_obstack, buffer, strlen (buffer)); > > break; > > case BIT_FIELD_REF: > case REALPART_EXPR: > case IMAGPART_EXPR: > gcc_unreachable (); /* we treat these as scalars. */ > break; > default: > break; > } > } > > /* Create a human readable name for replacement variable of ACCESS. */ > > static char * > make_fancy_name (tree expr) > { > make_fancy_name_1 (expr); > obstack_1grow (&name_obstack, '\0'); > return XOBFINISH (&name_obstack, char *); > } As all new scalars are DECL_ARTIFICIAL anyway why bother to create a fancy name? .... > /* Helper function for build_ref_for_offset. */ > > static bool > build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset, > tree exp_type) > { > while (1) > { > tree fld; > tree tr_size, index; > HOST_WIDE_INT el_size; > > if (offset == 0 && exp_type > && useless_type_conversion_p (exp_type, type)) > return true; > > switch (TREE_CODE (type)) > { > case UNION_TYPE: > case QUAL_UNION_TYPE: > case RECORD_TYPE: > /* Some ADA records are half-unions, treat all of them the same. */ > for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) > { > HOST_WIDE_INT pos, size; > tree expr, *expr_ptr; > > if (TREE_CODE (fld) != FIELD_DECL) > continue; > > pos = int_bit_position (fld); > gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0); > size = tree_low_cst (DECL_SIZE (fld), 1); > if (pos > offset || (pos + size) <= offset) > continue; > > if (res) > { > expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld, > NULL_TREE); > expr_ptr = &expr; > } > else > expr_ptr = NULL; > if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld), > offset - pos, exp_type)) > { > if (res) > *res = expr; > return true; > } > } > return false; > > case ARRAY_TYPE: > tr_size = TYPE_SIZE (TREE_TYPE (type)); > if (!tr_size || !host_integerp (tr_size, 1)) > return false; > el_size = tree_low_cst (tr_size, 1); > > index = build_int_cst (TYPE_DOMAIN (type), offset / el_size); > if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) > index = int_const_binop (PLUS_EXPR, index, > TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0); > if (res) > *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE, > NULL_TREE); > offset = offset % el_size; > type = TREE_TYPE (type); > break; > > default: > if (offset != 0) > return false; > > if (exp_type) > return false; > else > return true; > } > } > } > > /* Construct an expression that would reference a part of aggregate *EXPR of > type TYPE at the given OFFSET of the type EXP_TYPE. If EXPR is NULL, the > function only determines whether it can build such a reference without > actually doing it. > > FIXME: Eventually this should be replaced with > maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a > minor rewrite of fold_stmt. > */ > > static bool > build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset, > tree exp_type, bool allow_ptr) > { > if (allow_ptr && POINTER_TYPE_P (type)) > { > type = TREE_TYPE (type); > if (expr) > *expr = fold_build1 (INDIRECT_REF, type, *expr); > } > > return build_ref_for_offset_1 (expr, type, offset, exp_type); > } > > /* The very first phase of intraprocedural SRA. It marks in candidate_bitmap > those with type which is suitable for scalarization. */ > > static bool > find_var_candidates (void) > { > tree var, type; > referenced_var_iterator rvi; > bool ret = false; > > FOR_EACH_REFERENCED_VAR (var, rvi) > { > if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL) > continue; > type = TREE_TYPE (var); > > if (!AGGREGATE_TYPE_P (type) > || needs_to_live_in_memory (var) Ok, here's the TREE_ADDRESSABLE check. I'm finally convinced that disqualify_all should go ;) > || TREE_THIS_VOLATILE (var) > || !COMPLETE_TYPE_P (type) > || !host_integerp (TYPE_SIZE (type), 1) > || tree_low_cst (TYPE_SIZE (type), 1) == 0 > || type_internals_preclude_sra_p (type)) > continue; > > bitmap_set_bit (candidate_bitmap, DECL_UID (var)); > > if (dump_file) && (dump_flags & TDF_DETAILS) > { > fprintf (dump_file, "Candidate (%d): ", DECL_UID (var)); > print_generic_expr (dump_file, var, 0); > fprintf (dump_file, "\n"); > } > ret = true; > } > > return ret; > } > > /* Return true if TYPE should be considered a scalar type by SRA. */ > > static bool > is_sra_scalar_type (tree type) > { > enum tree_code code = TREE_CODE (type); > return (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type) > || FIXED_POINT_TYPE_P (type) || POINTER_TYPE_P (type) > || code == VECTOR_TYPE || code == COMPLEX_TYPE > || code == OFFSET_TYPE); > } Why is this anything different from is_gimple_reg_type ()? > /* Sort all accesses for the given variable, check for partial overlaps and > return NULL if there are any. If there are none, pick a representative for > each combination of offset and size and create a linked list out of them. > Return the pointer to the first representative and make sure it is the first > one in the vector of accesses. */ > > static struct access * > sort_and_splice_var_accesses (tree var) > { > int i, j, access_count; > struct access *res, **prev_acc_ptr = &res; > VEC (access_p, heap) *access_vec; > bool first = true; > HOST_WIDE_INT low = -1, high = 0; > > access_vec = get_base_access_vector (var); > if (!access_vec) > return NULL; > access_count = VEC_length (access_p, access_vec); > > /* Sort by <OFFSET, SIZE>. */ > qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p), > compare_access_positions); > > i = 0; > while (i < access_count) > { > struct access *access = VEC_index (access_p, access_vec, i); > bool modification = access->write; > bool grp_read = !access->write; > bool grp_bfr_lhs = access->grp_bfr_lhs; > bool first_scalar = is_sra_scalar_type (access->type); > bool unscalarizable_region = access->grp_unscalarizable_region; > > if (first || access->offset >= high) > { > first = false; > low = access->offset; > high = access->offset + access->size; > } > else if (access->offset > low && access->offset + access->size > high) > return NULL; > else > gcc_assert (access->offset >= low > && access->offset + access->size <= high); > > j = i + 1; > while (j < access_count) > { > struct access *ac2 = VEC_index (access_p, access_vec, j); > if (ac2->offset != access->offset || ac2->size != access->size) > break; > modification |= ac2->write; > grp_read |= !ac2->write; > grp_bfr_lhs |= ac2->grp_bfr_lhs; > unscalarizable_region |= ac2->grp_unscalarizable_region; > relink_to_new_repr (access, ac2); > > /* If one of the equivalent accesses is scalar, use it as a > representative (this happens when when there is for example on a > single scalar field in a structure). */ > if (!first_scalar && is_sra_scalar_type (ac2->type)) > { > struct access tmp_acc; > first_scalar = true; > > memcpy (&tmp_acc, ac2, sizeof (struct access)); > memcpy (ac2, access, sizeof (struct access)); > memcpy (access, &tmp_acc, sizeof (struct access)); > } > ac2->group_representative = access; > j++; > } > > i = j; > > access->group_representative = access; > access->grp_write = modification; > access->grp_read = grp_read; > access->grp_bfr_lhs = grp_bfr_lhs; > access->grp_unscalarizable_region = unscalarizable_region; > if (access->first_link) > add_access_to_work_queue (access); > > *prev_acc_ptr = access; > prev_acc_ptr = &access->next_grp; > } > > gcc_assert (res == VEC_index (access_p, access_vec, 0)); > return res; > } > > /* Create a variable for the given ACCESS which determines the type, name and a > few other properties. Return the variable declaration and store it also to > ACCESS->replacement. */ > > static tree > create_access_replacement (struct access *access) > { > tree repl; > > repl = make_rename_temp (access->type, "SR"); > get_var_ann (repl); > add_referenced_var (repl); > > DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); > DECL_ARTIFICIAL (repl) = 1; > > if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) at least && !DECL_ARTIFICIAL (access->base) I think. > { > char *pretty_name = make_fancy_name (access->expr); > > DECL_NAME (repl) = get_identifier (pretty_name); > obstack_free (&name_obstack, pretty_name); > > SET_DECL_DEBUG_EXPR (repl, access->expr); > DECL_DEBUG_EXPR_IS_FROM (repl) = 1; > DECL_IGNORED_P (repl) = 0; > TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base); > } > else > { > DECL_IGNORED_P (repl) = 1; > TREE_NO_WARNING (repl) = 1; > } So just copy DECL_IGNORED_P and TREE_NO_WARNING from access->base unconditionally? > if (access->grp_bfr_lhs) > DECL_GIMPLE_REG_P (repl) = 0; But you never set it (see update_address_taken for more cases, most notably VIEW_CONVERT_EXPR on the lhs which need to be taken care of). You should set it for COMPLEX_TYPE and VECTOR_TYPE replacements. > if (dump_file) > { > fprintf (dump_file, "Created a replacement for "); > print_generic_expr (dump_file, access->base, 0); > fprintf (dump_file, " offset: %u, size: %u: ", > (unsigned) access->offset, (unsigned) access->size); > print_generic_expr (dump_file, repl, 0); > fprintf (dump_file, "\n"); > } > > return repl; > } > > /* Return ACCESS scalar replacement, create it if it does not exist yet. */ > > static inline tree > get_access_replacement (struct access *access) > { > gcc_assert (access->grp_to_be_replaced); > > if (access->replacement_decl) > return access->replacement_decl; > > access->replacement_decl = create_access_replacement (access); > return access->replacement_decl; > } > > /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the > linked list along the way. Stop when *ACCESS is NULL or the access pointed > to it is not "within" the root. */ > > static void > build_access_subtree (struct access **access) > { > struct access *root = *access, *last_child = NULL; > HOST_WIDE_INT limit = root->offset + root->size; > > *access = (*access)->next_grp; > while (*access && (*access)->offset + (*access)->size <= limit) > { > if (!last_child) > root->first_child = *access; > else > last_child->next_sibling = *access; > last_child = *access; > > build_access_subtree (access); > } > } > > /* Build a tree of access representatives, ACCESS is the pointer to the first > one, others are linked in a list by the next_grp field. Decide about scalar > replacements on the way, return true iff any are to be created. */ > > static void > build_access_trees (struct access *access) > { > while (access) > { > struct access *root = access; > > build_access_subtree (&access); > root->next_grp = access; > } > } > > /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when > both seeming beneficial and when ALLOW_REPLACEMENTS allows it. Also set > all sorts of access flags appropriately along the way, notably always ser > grp_read when MARK_READ is true and grp_write when MARK_WRITE is true. */ > > static bool > analyze_access_subtree (struct access *root, bool allow_replacements, > bool mark_read, bool mark_write) > { > struct access *child; > HOST_WIDE_INT limit = root->offset + root->size; > HOST_WIDE_INT covered_to = root->offset; > bool scalar = is_sra_scalar_type (root->type); > bool hole = false, sth_created = false; > > if (mark_read) > root->grp_read = true; > else if (root->grp_read) > mark_read = true; > > if (mark_write) > root->grp_write = true; > else if (root->grp_write) > mark_write = true; > > if (root->grp_unscalarizable_region) > allow_replacements = false; > > for (child = root->first_child; child; child = child->next_sibling) > { > if (!hole && child->offset < covered_to) > hole = true; > else > covered_to += child->size; > > sth_created |= analyze_access_subtree (child, > allow_replacements && !scalar, > mark_read, mark_write); > > root->grp_unscalarized_data |= child->grp_unscalarized_data; > hole |= !child->grp_covered; > } > > if (allow_replacements && scalar && !root->first_child) > { > if (dump_file) && (dump_flags & TDF_DETAILS) > { > fprintf (dump_file, "Marking "); > print_generic_expr (dump_file, root->base, 0); > fprintf (dump_file, " offset: %u, size: %u: ", > (unsigned) root->offset, (unsigned) root->size); > fprintf (dump_file, " to be replaced.\n"); > } > > root->grp_to_be_replaced = 1; > sth_created = true; > hole = false; > } > else if (covered_to < limit) > hole = true; > > if (sth_created && !hole) > { > root->grp_covered = 1; > return true; > } > if (root->grp_write || TREE_CODE (root->base) == PARM_DECL) > root->grp_unscalarized_data = 1; /* not covered and written to */ > if (sth_created) > return true; > return false; > } > > /* Analyze all access trees linked by next_grp by the means of > analyze_access_subtree. */ > static bool > analyze_access_trees (struct access *access) > { > bool ret = false; > > while (access) > { > if (analyze_access_subtree (access, true, false, false)) > ret = true; > access = access->next_grp; > } > > return ret; > } > > /* Return true iff a potential new child of LACC at offset OFFSET and with size > SIZE would conflict with an already existing one. If exactly such a child > already exists in LACC, store a pointer to it in EXACT_MATCH. */ > > static bool > child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, > HOST_WIDE_INT size, struct access **exact_match) > { > struct access *child; > > for (child = lacc->first_child; child; child = child->next_sibling) > { > if (child->offset == norm_offset && child->size == size) > { > *exact_match = child; > return true; > } > > if (child->offset < norm_offset + size > && child->offset + child->size > norm_offset) > return true; > } > > return false; > } > > /* Set the expr of TARGET to one just like MODEL but with is own base at the > bottom of the handled components. */ > > static void > duplicate_expr_for_different_base (struct access *target, > struct access *model) > { > tree t, expr = unshare_expr (model->expr); > > gcc_assert (handled_component_p (expr)); > t = expr; > while (handled_component_p (TREE_OPERAND (t, 0))) > t = TREE_OPERAND (t, 0); > gcc_assert (TREE_OPERAND (t, 0) == model->base); > TREE_OPERAND (t, 0) = target->base; > > target->expr = expr; > } > > > /* Create a new child access of PARENT, with all properties just like MODEL > except for its offset and with its grp_write false and grp_read true. > Return the new access. Note that this access is created long after all > splicing and sorting, it's not located in any access vector and is > automatically a representative of its group. */ > > static struct access * > create_artificial_child_access (struct access *parent, struct access *model, > HOST_WIDE_INT new_offset) > { > struct access *access; > struct access **child; > > gcc_assert (!model->grp_unscalarizable_region); > > access = (struct access *) pool_alloc (access_pool); > memset (access, 0, sizeof (struct access)); > access->base = parent->base; > access->offset = new_offset; > access->size = model->size; > duplicate_expr_for_different_base (access, model); > access->type = model->type; > access->grp_write = true; > access->grp_read = false; > > child = &parent->first_child; > while (*child && (*child)->offset < new_offset) > child = &(*child)->next_sibling; > > access->next_sibling = *child; > *child = access; > > return access; > } > > > /* Propagate all subaccesses of RACC across an assignment link to LACC. Return > true if any new subaccess was created. Additionally, if RACC is a scalar > access but LACC is not, change the type of the latter. */ > > static bool > propagate_subacesses_accross_link (struct access *lacc, struct access *racc) > { > struct access *rchild; > HOST_WIDE_INT norm_delta = lacc->offset - racc->offset; > bool ret = false; > > if (is_sra_scalar_type (lacc->type) > || lacc->grp_unscalarizable_region > || racc->grp_unscalarizable_region) > return false; > > if (!lacc->first_child && !racc->first_child > && is_sra_scalar_type (racc->type) > && (sra_mode == SRA_MODE_INTRA > || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base)))) > { > duplicate_expr_for_different_base (lacc, racc); > lacc->type = racc->type; > return false; > } > > gcc_assert (lacc->size <= racc->size); > > for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling) > { > struct access *new_acc = NULL; > HOST_WIDE_INT norm_offset = rchild->offset + norm_delta; > > if (rchild->grp_unscalarizable_region) > continue; > > if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size, > &new_acc)) > { > if (new_acc && rchild->first_child) > ret |= propagate_subacesses_accross_link (new_acc, rchild); > continue; > } > > new_acc = create_artificial_child_access (lacc, rchild, norm_offset); > if (racc->first_child) > propagate_subacesses_accross_link (new_acc, rchild); > > ret = true; > } > > return ret; > } > > /* Propagate all subaccesses across assignment links. */ > > static void > propagate_all_subaccesses (void) > { > while (work_queue_head) > { > struct access *racc = pop_access_from_work_queue (); > struct assign_link *link; > > gcc_assert (racc->first_link); > > for (link = racc->first_link; link; link = link->next) > { > struct access *lacc = link->lacc; > > if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base))) > continue; > lacc = lacc->group_representative; > if (propagate_subacesses_accross_link (lacc, racc) > && lacc->first_link) > add_access_to_work_queue (lacc); > } > } > } > > /* Go through all accesses collected throughout the (intraprocedural) analysis > stage, exclude overlapping ones, identify representatives and build trees > out of them, making decisions about scalarization on the way. Return true > iff there are any to-be-scalarized variables after this stage. */ > > static bool > analyze_all_variable_accesses (void) > { > tree var; > referenced_var_iterator rvi; > bool res = false; > > FOR_EACH_REFERENCED_VAR (var, rvi) > if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) > { > struct access *access; > > access = sort_and_splice_var_accesses (var); > if (access) > build_access_trees (access); > else > disqualify_candidate (var, > "No or inhibitingly overlapping accesses."); > } > > propagate_all_subaccesses (); > > FOR_EACH_REFERENCED_VAR (var, rvi) > if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) > { > struct access *access = get_first_repr_for_decl (var); > > if (analyze_access_trees (access)) > { > res = true; > if (dump_file) && (dump_flags & TDF_DETAILS) > { > fprintf (dump_file, "\nAccess trees for "); > print_generic_expr (dump_file, var, 0); > fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); > dump_access_tree (dump_file, access); > fprintf (dump_file, "\n"); > } > } > else > disqualify_candidate (var, "No scalar replacements to be created."); > } > > return res; > } > > /* Return true iff a reference statement into aggregate AGG can be built for > every single to-be-replaced accesses that is a child of ACCESS, its sibling > or a child of its sibling. TOP_OFFSET is the offset from the processed > access subtree that has to be subtracted from offset of each access. */ > > static bool > ref_expr_for_all_replacements_p (struct access *access, tree agg, > HOST_WIDE_INT top_offset) > { > do > { > if (access->grp_to_be_replaced > && !build_ref_for_offset (NULL, TREE_TYPE (agg), > access->offset - top_offset, > access->type, false)) > return false; > > if (access->first_child > && !ref_expr_for_all_replacements_p (access->first_child, agg, > top_offset)) > return false; > > access = access->next_sibling; > } > while (access); > > return true; > } > > > /* Generate statements copying scalar replacements of accesses within a subtree > into or out of AGG. ACCESS is the first child of the root of the subtree to > be processed. AGG is an aggregate type expression (can be a declaration but > does not have to be, it can for example also be an indirect_ref). > TOP_OFFSET is the offset of the processed subtree which has to be subtracted > from offsets of individual accesses to get corresponding offsets for AGG. > If CHUNK_SIZE is non-null, copy only replacements in the interval > <start_offset, start_offset + chunk_size>, otherwise copy all. GSI is a > statement iterator used to place the new statements. WRITE should be true > when the statements should write from AGG to the replacement and false if > vice versa. if INSERT_AFTER is true, new statements will be added after the > current statement in GSI, they will be added before the statement > otherwise. */ > > static void > generate_subtree_copies (struct access *access, tree agg, > HOST_WIDE_INT top_offset, > HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size, > gimple_stmt_iterator *gsi, bool write, > bool insert_after) > { > do > { > tree expr = unshare_expr (agg); > > if (chunk_size && access->offset >= start_offset + chunk_size) > return; > > if (access->grp_to_be_replaced > && (chunk_size == 0 > || access->offset + access->size > start_offset)) > { > bool repl_found; > gimple stmt; > > repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg), > access->offset - top_offset, > access->type, false); > gcc_assert (repl_found); > > if (write) > stmt = gimple_build_assign (get_access_replacement (access), expr); > else > { > tree repl = get_access_replacement (access); > TREE_NO_WARNING (repl) = 1; > stmt = gimple_build_assign (expr, repl); > } > > if (insert_after) > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > else > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > update_stmt (stmt); > } > > if (access->first_child) > generate_subtree_copies (access->first_child, agg, top_offset, > start_offset, chunk_size, gsi, > write, insert_after); > > access = access->next_sibling; > } > while (access); > } > > /* Assign zero to all scalar replacements in an access subtree. ACCESS is the > the root of the subtree to be processed. GSI is the statement iterator used > for inserting statements which are added after the current statement if > INSERT_AFTER is true or before it otherwise. */ > > static void > init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi, > bool insert_after) > > { > struct access *child; > > if (access->grp_to_be_replaced) > { > gimple stmt; > > stmt = gimple_build_assign (get_access_replacement (access), > fold_convert (access->type, > integer_zero_node)); > if (insert_after) > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > else > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > update_stmt (stmt); > } > > for (child = access->first_child; child; child = child->next_sibling) > init_subtree_with_zero (child, gsi, insert_after); > } > > /* Search for an access representative for the given expression EXPR and > return it or NULL if it cannot be found. */ > > static struct access * > get_access_for_expr (tree expr) > { > HOST_WIDE_INT offset, size, max_size; > tree base; > > if (TREE_CODE (expr) == NOP_EXPR CONVERT_EXPR_P (expr) > || TREE_CODE (expr) == VIEW_CONVERT_EXPR) VIEW_CONVERT_EXPR is also a handled_component_p. Note that NOP_EXPR should never occur here - that would be invalid gimple. So I think you can (and should) just delete the above. > expr = TREE_OPERAND (expr, 0); > > if (handled_component_p (expr)) > { > base = get_ref_base_and_extent (expr, &offset, &size, &max_size); > size = max_size; > if (size == -1 || !base || !DECL_P (base)) > return NULL; > } > else if (DECL_P (expr)) > { > tree tree_size; > > base = expr; > tree_size = TYPE_SIZE (TREE_TYPE (base)); > if (tree_size && host_integerp (tree_size, 1)) > size = max_size = tree_low_cst (tree_size, 1); > else > return NULL; > > offset = 0; See above. get_ref_base_and_extent handles plain DECLs just fine. > } > else > return NULL; > > if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base))) > return NULL; > > return get_var_base_offset_size_access (base, offset, size); > } > > /* Substitute into *EXPR an expression of type TYPE with the value of the > replacement of ACCESS. This is done either by producing a special V_C_E > assignment statement converting the replacement to a new temporary of the > requested type if TYPE is not TREE_ADDRESSABLE or by going through the base > aggregate if it is. */ > > static void > fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, > gimple_stmt_iterator *gsi, bool write) > { > tree repl = get_access_replacement (access); > if (!TREE_ADDRESSABLE (type)) > { > tree tmp = create_tmp_var (type, "SRvce"); > > add_referenced_var (tmp); > if (is_gimple_reg_type (type)) > tmp = make_ssa_name (tmp, NULL); Should be always is_gimple_reg_type () if it is a type suitable for a SRA scalar replacement. But you should set DECL_GIMPLE_REG_P for VECTOR and COMPLEX types here. > if (write) > { > gimple stmt; > tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); This needs to either always fold to plain 'tmp' or tmp has to be a non-register. Otherwise you will create invalid gimple. > *expr = tmp; > if (is_gimple_reg_type (type)) > SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); See above. > stmt = gimple_build_assign (repl, conv); > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > update_stmt (stmt); > } > else > { > gimple stmt; > tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); > > stmt = gimple_build_assign (tmp, conv); > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > if (is_gimple_reg_type (type)) > SSA_NAME_DEF_STMT (tmp) = stmt; See above. (I wonder if the patch still passes bootstrap & regtest after the typecking patch) > *expr = tmp; > update_stmt (stmt); > } > } > else > { > if (write) > { > gimple stmt; > > stmt = gimple_build_assign (repl, unshare_expr (access->expr)); > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > update_stmt (stmt); > } > else > { > gimple stmt; > > stmt = gimple_build_assign (unshare_expr (access->expr), repl); > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > update_stmt (stmt); > } I don't understand this path. Are the types here always compatible? > } > } > > > /* Callback for scan_function. Replace the expression EXPR with a scalar > replacement if there is one and generate other statements to do type > conversion or subtree copying if necessary. GSI is used to place newly > created statements, WRITE is true if the expression is being written to (it > is on a LHS of a statement or output in an assembly statement). */ > > static bool > sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write, > void *data ATTRIBUTE_UNUSED) > { > struct access *access; > tree type, bfr; > > if (TREE_CODE (*expr) == BIT_FIELD_REF) > { > bfr = *expr; > expr = &TREE_OPERAND (*expr, 0); > } > else > bfr = NULL_TREE; > > if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR) > expr = &TREE_OPERAND (*expr, 0); Why strip these early? I think this is wrong (or do you always want to produce complex type replacements, even if only the real or imaginary part are used? If so a strathegic comment somewhere is missing.) > type = TREE_TYPE (*expr); > > access = get_access_for_expr (*expr); > if (!access) > return false; > > if (access->grp_to_be_replaced) > { > if (!useless_type_conversion_p (type, access->type)) > fix_incompatible_types_for_expr (expr, type, access, gsi, write); > else > *expr = get_access_replacement (access); > } > > if (access->first_child) > { > HOST_WIDE_INT start_offset, chunk_size; > if (bfr > && host_integerp (TREE_OPERAND (bfr, 1), 1) > && host_integerp (TREE_OPERAND (bfr, 2), 1)) > { > start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1); > chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1); > } > else > start_offset = chunk_size = 0; > > generate_subtree_copies (access->first_child, access->base, 0, > start_offset, chunk_size, gsi, write, write); > } > return true; > } > > /* Store all replacements in the access tree rooted in TOP_RACC either to their > base aggregate if there are unscalarized data or directly to LHS > otherwise. */ > > static void > handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs, > gimple_stmt_iterator *gsi) > { > if (top_racc->grp_unscalarized_data) > generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0, > gsi, false, false); > else > generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset, > 0, 0, gsi, false, false); > } > > > /* Try to generate statements to load all sub-replacements in an access > (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC > (sub)tree. If that is not possible, refresh the TOP_RACC base aggregate and > load the accesses from it. LEFT_OFFSET is the offset of the left whole > subtree being copied, RIGHT_OFFSET is the same thing for the right subtree. > GSI is stmt iterator used for statement insertions. *REFRESHED is true iff > the rhs top aggregate has already been refreshed by contents of its scalar > reductions and is set to true if this function has to do it. */ > > static void > load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc, > HOST_WIDE_INT left_offset, > HOST_WIDE_INT right_offset, > gimple_stmt_iterator *old_gsi, > gimple_stmt_iterator *new_gsi, > bool *refreshed, tree lhs) > { > do > { > if (lacc->grp_to_be_replaced) > { > struct access *racc; > HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset; > > racc = find_access_in_subtree (top_racc, offset, lacc->size); > if (racc && racc->grp_to_be_replaced) > { > gimple stmt; > > if (useless_type_conversion_p (lacc->type, racc->type)) > stmt = gimple_build_assign (get_access_replacement (lacc), > get_access_replacement (racc)); > else > { > tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type, > get_access_replacement (racc)); > stmt = gimple_build_assign (get_access_replacement (lacc), > rhs); > } > > gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); > update_stmt (stmt); > } > else > { > tree expr = unshare_expr (top_racc->base); > bool repl_found; > gimple stmt; > > /* No suitable access on the right hand side, need to load from > the aggregate. See if we have to update it first... */ > if (!*refreshed) > { > gcc_assert (top_racc->first_child); > handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); > *refreshed = true; > } > > repl_found = build_ref_for_offset (&expr, > TREE_TYPE (top_racc->base), > lacc->offset - left_offset, > lacc->type, false); > gcc_assert (repl_found); > stmt = gimple_build_assign (get_access_replacement (lacc), > expr); > gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); > update_stmt (stmt); > } > } > else if (lacc->grp_read && !lacc->grp_covered && !*refreshed) > { > handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); > *refreshed = true; > } > > if (lacc->first_child) > load_assign_lhs_subreplacements (lacc->first_child, top_racc, > left_offset, right_offset, > old_gsi, new_gsi, refreshed, lhs); > lacc = lacc->next_sibling; > } > while (lacc); > } > > /* Return true iff ACC is non-NULL and has subaccesses. */ > > static inline bool > access_has_children_p (struct access *acc) > { > return acc && acc->first_child; > } > > /* Modify assignments with a CONSTRUCTOR on their RHS. STMT contains a pointer > to the assignment and GSI is the statement iterator pointing at it. Returns > the same values as sra_modify_assign. */ > > static enum scan_assign_result > sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi) > { > tree lhs = gimple_assign_lhs (*stmt); > struct access *acc; > > gcc_assert (TREE_CODE (lhs) != REALPART_EXPR > && TREE_CODE (lhs) != IMAGPART_EXPR); > acc = get_access_for_expr (lhs); > if (!acc) > return SRA_SA_NONE; > > if (VEC_length (constructor_elt, > CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0) > { > /* I have never seen this code path trigger but if it can happen the > following should handle it gracefully. */ It can trigger for vector constants. > if (access_has_children_p (acc)) > generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi, > true, true); > return SRA_SA_PROCESSED; > } > > if (acc->grp_covered) > { > init_subtree_with_zero (acc, gsi, false); > unlink_stmt_vdef (*stmt); > gsi_remove (gsi, true); > return SRA_SA_REMOVED; > } > else > { > init_subtree_with_zero (acc, gsi, true); > return SRA_SA_PROCESSED; > } > } > > > /* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with > to-be-scalarized expressions with them. STMT is the statement and GSI is > the iterator used to place new helper statements. Returns the same values > as sra_modify_assign. */ > > static enum scan_assign_result > sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi) > { > tree lhs, complex, ptype, rp, ip; > struct access *access; > gimple new_stmt, aux_stmt; > > lhs = gimple_assign_lhs (stmt); > complex = TREE_OPERAND (lhs, 0); > > access = get_access_for_expr (complex); > > if (!access || !access->grp_to_be_replaced) > return SRA_SA_NONE; > > ptype = TREE_TYPE (TREE_TYPE (complex)); > rp = create_tmp_var (ptype, "SRr"); > add_referenced_var (rp); > rp = make_ssa_name (rp, NULL); > > ip = create_tmp_var (ptype, "SRp"); > add_referenced_var (ip); > ip = make_ssa_name (ip, NULL); > > if (TREE_CODE (lhs) == IMAGPART_EXPR) > { > aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype, > get_access_replacement (access))); > SSA_NAME_DEF_STMT (rp) = aux_stmt; > gimple_assign_set_lhs (stmt, ip); > SSA_NAME_DEF_STMT (ip) = stmt; > } > else > { > aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype, > get_access_replacement (access))); > SSA_NAME_DEF_STMT (ip) = aux_stmt; > gimple_assign_set_lhs (stmt, rp); > SSA_NAME_DEF_STMT (rp) = stmt; > } > > gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT); > update_stmt (aux_stmt); > new_stmt = gimple_build_assign (get_access_replacement (access), > fold_build2 (COMPLEX_EXPR, access->type, > rp, ip)); > gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); > update_stmt (new_stmt); Hm. So you do what complex lowering does here. Note that this may create loads from uninitialized memory with all its problems. WRT the complex stuff. If you would do scalarization and analysis just on the components (not special case REAL/IMAGPART_EXPR everywhere) it should work better, correct? You still could handle group scalarization for the case of for example passing a complex argument to a function. void bar(_Complex float); void foo(float x, float y) { _Complex float z = x; __imag z = y; bar(z); } The same applies for vectors - the REAL/IMAGPART_EXPRs equivalent there is BIT_FIELD_REF. > return SRA_SA_PROCESSED; > } > > /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components. */ > > static bool > contains_view_convert_expr_p (tree t) > { > while (1) > { > if (TREE_CODE (t) == VIEW_CONVERT_EXPR) > return true; > if (!handled_component_p (t)) > return false; > t = TREE_OPERAND (t, 0); > } > } Place this in tree-flow-inline.h next to ref_contains_array_ref, also structure the loop in the same way. > /* Change STMT to assign compatible types by means of adding component or array > references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as > variable with the same names in sra_modify_assign. This is done in a > such a complicated way in order to make > testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some > cases. */ > > static void > fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt, > struct access *lacc, struct access *racc, > tree lhs, tree *rhs, tree ltype, tree rtype) > { > if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) > && !access_has_children_p (lacc)) > { > tree expr = unshare_expr (lhs); > bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, > false); > if (found) > { > gimple_assign_set_lhs (*stmt, expr); > return; > } > } > > if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) > && !access_has_children_p (racc)) > { > tree expr = unshare_expr (*rhs); > bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, > false); > if (found) > { > gimple_assign_set_rhs1 (*stmt, expr); > return; > } > } > > *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); > gimple_assign_set_rhs_from_tree (gsi, *rhs); > *stmt = gsi_stmt (*gsi); Reading this I have a deja-vu - isn't there another function in this file doing the same thing? You are doing much unsharing even though you re-build the access tree from scratch? > } > > /* Callback of scan_function to process assign statements. It examines both > sides of the statement, replaces them with a scalare replacement if there is > one and generating copying of replacements if scalarized aggregates have been > used in the assignment. STMT is a pointer to the assign statement, GSI is > used to hold generated statements for type conversions and subtree > copying. */ > > static enum scan_assign_result > sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, > void *data ATTRIBUTE_UNUSED) > { > struct access *lacc, *racc; > tree ltype, rtype; > tree lhs, rhs; > bool modify_this_stmt; > > if (gimple_assign_rhs2 (*stmt)) !gimple_assign_single_p (*stmt) (the only gimple assign that may access memory) > return SRA_SA_NONE; > lhs = gimple_assign_lhs (*stmt); > rhs = gimple_assign_rhs1 (*stmt); > > if (TREE_CODE (rhs) == CONSTRUCTOR) > return sra_modify_constructor_assign (stmt, gsi); > > if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) > return sra_modify_partially_complex_lhs (*stmt, gsi); > > if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR > || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) > { > modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), > gsi, false, data); > modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), > gsi, true, data); > return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; > } > > lacc = get_access_for_expr (lhs); > racc = get_access_for_expr (rhs); > if (!lacc && !racc) > return SRA_SA_NONE; > > modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) > || (racc && racc->grp_to_be_replaced)); > > if (lacc && lacc->grp_to_be_replaced) > { > lhs = get_access_replacement (lacc); > gimple_assign_set_lhs (*stmt, lhs); > ltype = lacc->type; > } > else > ltype = TREE_TYPE (lhs); > > if (racc && racc->grp_to_be_replaced) > { > rhs = get_access_replacement (racc); > gimple_assign_set_rhs1 (*stmt, rhs); > rtype = racc->type; > } > else > rtype = TREE_TYPE (rhs); > > /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate > the statement makes the position of this pop_stmt_changes() a bit awkward > but hopefully make some sense. */ I don't see pop_stmt_changes(). > if (modify_this_stmt) > { > if (!useless_type_conversion_p (ltype, rtype)) > fix_modified_assign_compatibility (gsi, stmt, lacc, racc, > lhs, &rhs, ltype, rtype); > } > > if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) > || (access_has_children_p (racc) > && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) > || (access_has_children_p (lacc) > && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) ? A comment is missing what this case is about ... (this smells like fixup that could be avoided by doing things correct in the first place) > { > if (access_has_children_p (racc)) > generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0, > gsi, false, false); > if (access_has_children_p (lacc)) > generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0, > gsi, true, true); > } > else > { > if (access_has_children_p (lacc) && access_has_children_p (racc)) > { > gimple_stmt_iterator orig_gsi = *gsi; > bool refreshed; > > if (lacc->grp_read && !lacc->grp_covered) > { > handle_unscalarized_data_in_subtree (racc, lhs, gsi); > refreshed = true; > } > else > refreshed = false; > > load_assign_lhs_subreplacements (lacc->first_child, racc, > lacc->offset, racc->offset, > &orig_gsi, gsi, &refreshed, lhs); > if (!refreshed || !racc->grp_unscalarized_data) > { > if (*stmt == gsi_stmt (*gsi)) > gsi_next (gsi); > > unlink_stmt_vdef (*stmt); > gsi_remove (&orig_gsi, true); > return SRA_SA_REMOVED; > } > } > else > { > if (access_has_children_p (racc)) > { > if (!racc->grp_unscalarized_data) > { > generate_subtree_copies (racc->first_child, lhs, > racc->offset, 0, 0, gsi, > false, false); > gcc_assert (*stmt == gsi_stmt (*gsi)); > unlink_stmt_vdef (*stmt); > gsi_remove (gsi, true); > return SRA_SA_REMOVED; > } > else > generate_subtree_copies (racc->first_child, lhs, > racc->offset, 0, 0, gsi, false, true); > } > else if (access_has_children_p (lacc)) > generate_subtree_copies (lacc->first_child, rhs, lacc->offset, > 0, 0, gsi, true, true); > } > } > > return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; > } > > /* Generate statements initializing scalar replacements of parts of function > parameters. */ > > static void > initialize_parameter_reductions (void) > { > gimple_stmt_iterator gsi; > gimple_seq seq = NULL; > tree parm; > > for (parm = DECL_ARGUMENTS (current_function_decl); > parm; > parm = TREE_CHAIN (parm)) > { > VEC (access_p, heap) *access_vec; > struct access *access; > > if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm))) > continue; > access_vec = get_base_access_vector (parm); > if (!access_vec) > continue; > > if (!seq) > { > seq = gimple_seq_alloc (); > gsi = gsi_start (seq); > } > > for (access = VEC_index (access_p, access_vec, 0); > access; > access = access->next_grp) > generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true); > } > > if (seq) > gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq); > } > > /* The "main" function of intraprocedural SRA passes. Runs the analysis and if > it reveals there are components of some aggregates to be scalarized, it runs > the required transformations. */ > static unsigned int > perform_intra_sra (void) > { > int ret = 0; > sra_initialize (); > > if (!find_var_candidates ()) > goto out; > > if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL, > true, NULL)) > goto out; > > if (!analyze_all_variable_accesses ()) > goto out; > > scan_function (sra_modify_expr, sra_modify_assign, NULL, > false, NULL); > initialize_parameter_reductions (); > > ret = TODO_update_ssa; redundant set. > if (sra_mode == SRA_MODE_EARLY_INTRA) > ret = TODO_update_ssa; > else > ret = TODO_update_ssa | TODO_rebuild_alias; in fact you shouldn't (need to) rebuild alias. > out: > sra_deinitialize (); > return ret; > } > > /* Perform early intraprocedural SRA. */ > static unsigned int > early_intra_sra (void) > { > sra_mode = SRA_MODE_EARLY_INTRA; > return perform_intra_sra (); > } > > /* Perform "late" intraprocedural SRA. */ > static unsigned int > late_intra_sra (void) > { > sra_mode = SRA_MODE_INTRA; > return perform_intra_sra (); > } > > > static bool > gate_intra_sra (void) > { > return flag_tree_sra != 0; > } > > > struct gimple_opt_pass pass_sra_early = > { > { > GIMPLE_PASS, > "esra", /* name */ > gate_intra_sra, /* gate */ > early_intra_sra, /* execute */ > NULL, /* sub */ > NULL, /* next */ > 0, /* static_pass_number */ > TV_TREE_SRA, /* tv_id */ > PROP_cfg | PROP_ssa, /* properties_required */ > 0, /* properties_provided */ > 0, /* properties_destroyed */ > 0, /* todo_flags_start */ > TODO_dump_func > | TODO_update_ssa > | TODO_ggc_collect > | TODO_verify_ssa /* todo_flags_finish */ > } > }; > > > struct gimple_opt_pass pass_sra = > { > { > GIMPLE_PASS, > "sra", /* name */ > gate_intra_sra, /* gate */ > late_intra_sra, /* execute */ > NULL, /* sub */ > NULL, /* next */ > 0, /* static_pass_number */ > TV_TREE_SRA, /* tv_id */ > PROP_cfg | PROP_ssa, /* properties_required */ > 0, /* properties_provided */ > 0, /* properties_destroyed */ > TODO_update_address_taken, /* todo_flags_start */ > TODO_dump_func > | TODO_update_ssa > | TODO_ggc_collect > | TODO_verify_ssa /* todo_flags_finish */ > } > }; Overall it looks good - I'm still a little bit confused, but that's likely because reading code from top to bottom doesn't make the most sense in all cases ;) Looking forward to a second look on a revised version. Thanks, Richard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-04-29 12:56 ` Richard Guenther @ 2009-05-10 10:33 ` Martin Jambor 2009-05-10 11:48 ` Richard Guenther 2009-05-10 10:39 ` Martin Jambor 1 sibling, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-05-10 10:33 UTC (permalink / raw) To: Richard Guenther; +Cc: GCC Patches, Jan Hubicka Hi, thank you very much for a detailed review. You are probably correct that the functions are in a rather strange order for a reader. This is because IPA-SRA was removed and the things it shares with intra-SRA remained in the same order. I will send a new tree-sra.c as a separate reply to your mail in a few moments. A few reactions to your comments are below. On Wed, Apr 29, 2009 at 02:48:12PM +0200, Richard Guenther wrote: > On Tue, 28 Apr 2009, Martin Jambor wrote: > > > On Tue, Apr 28, 2009 at 12:04:32PM +0200, Martin Jambor wrote: > > > This is the new intraprocedural SRA. I have stripped off the > > > interprocedural part and will propose to commit it separately later. > > > I have tried to remove almost every trace of IPA-SRA, however, two > > > provisions for it have remained in the patch. First, an enumeration > > > (rather than a boolean) is used to distuinguish between "early" and > > > "late" SRA so that other SRA modes can be added later on. Second, > > > scan_function() has a hook parameter and a void pointer parameter > > > which are not used in this patch but will be by IPA-SRA. > > > > > > Otherwise, the patch is hopefully self-contained and the bases of its > > > operation is described by the initial comment. > > > > > > The patch bootstraps (on x86_64-linux-gnu but I am about to try it on > > > hppa-linux-gnu too) but produces a small number of testsuite failures > > > which are handled by the two following patches. > > > > > > Thanks, > > > > > > Martin > > > > > > > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > > > > > * tree-sra.c (enum sra_mode): The whole contents of the file was > > > replaced. > > > > Hm, the patch is quite unreadable, below is the new tree-sra.c file > > which entirely replaces the old one (note that the patch also modifies > > the Makefile though): > > Ah. Here it is ... the comment to the changelog still applies. Sure. > > /* Scalar Replacement of Aggregates (SRA) converts some structure > > references into scalar references, exposing them to the scalar > > optimizers. > > Copyright (C) 2008, 2009 Free Software Foundation, Inc. > > Contributed by Martin Jambor <mjambor@suse.cz> > > > > This file is part of GCC. > > > > GCC is free software; you can redistribute it and/or modify it under > > the terms of the GNU General Public License as published by the Free > > Software Foundation; either version 3, or (at your option) any later > > version. > > > > GCC is distributed in the hope that it will be useful, but WITHOUT ANY > > WARRANTY; without even the implied warranty of MERCHANTABILITY or > > FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > > for more details. > > > > You should have received a copy of the GNU General Public License > > along with GCC; see the file COPYING3. If not see > > <http://www.gnu.org/licenses/>. */ > > > > /* This file implements Scalar Reduction of Aggregates (SRA). SRA is run > > twice, once in the early stages of compilation (early SRA) and once in the > > late stages (late SRA). The aim of both is to turn references to scalar > > parts of aggregates into uses of independent scalar variables. > > > > The two passes are nearly identical, the only difference is that early SRA > > does not scalarize unions which are used as the result in a GIMPLE_RETURN > > statement because together with inlining this can lead to weird type > > conversions. > > Do you happen to have a testcase for this or can you describe that problem > some more? The testcase is gfortran.fortran-torture/execute/entry_4.f90 from our testsuite. IIRC there is a union with a float and a boolean in it which is returned in a number of functions. In different functions I selected one of these fields as scalar replacements (because the other one was not used in the function, I refuse to create replacements of accesses that are scalar but have children and children of scalar accesses because of similar problems) and used a V_C_E to construct a return value of the correct type. However, inlining combined these together and wanted to cast a boolean to a float. That made expander ICE on an assert. > > Both passes operate in four stages: > > > > 1. The declarations that have properties which make them candidates for > > scalarization are identified in function find_var_candidates(). The > > candidates are stored in candidate_bitmap. > > > > 2. The function body is scanned. In the process, declarations which are > > used in a manner that prevent their scalarization are removed from the > > candidate bitmap. More importantly, for every access into an aggregate, > > an access structure (struct access) is created by create_access() and > > stored in a vector associated with the aggregate. Among other > > information, the aggregate declaration, the offset and size of the access > > and its type are stored in the structure. > > > > On a related note, assign_link structures are created for every assign > > statement between candidate aggregates and attached to the related > > accesses. > > > > 3. The vectors of accesses are analyzed. They are first sorted according to > > their offset and size and then scanned for partially overlapping accesses > > (i.e. those which overlap but one is not entirely within another). Such > > an access disqualifies the whole aggregate from being scalarized. > > This happens only when get_ref_base_and_extent punts and returns -1 for > the access size? And of course with struct field accesses of different > structs that are inside the same union. This was specifically aimed at through-union accesses which could not be (both) represented by independent scalars. As you noted below, I bail out if get_ref_base_and_extent returned -1 for max_size. > > > If there is no such inhibiting overlap, a representative access structure > > is chosen for every unique combination of offset and size. Afterwards, > > the pass builds a set of trees from these structures, in which children > > of an access are within their parent (in terms of offset and size). > > > > Then accesses are propagated whenever possible (i.e. in cases when it > > does not create a partially overlapping access) across assign_links from > > the right hand side to the left hand side. > > > > Then the set of trees for each declaration is traversed again and those > > accesses which should be replaced by a scalar are identified. > > > > 4. The function is traversed again, and for every reference into an > > aggregate that has some component which is about to be scalarized, > > statements are amended and new statements are created as necessary. > > Finally, if a parameter got scalarized, the scalar replacements are > > initialized with values from respective parameter aggregates. > > */ > > Closing */ goes to the previous line. OK > > #include "config.h" > > #include "system.h" > > #include "coretypes.h" > > #include "alloc-pool.h" > > #include "tm.h" > > #include "tree.h" > > #include "gimple.h" > > #include "tree-flow.h" > > #include "diagnostic.h" > > #include "tree-dump.h" > > #include "timevar.h" > > #include "params.h" > > #include "target.h" > > #include "flags.h" > > > > /* Enumeration of all aggregate reductions we can do. */ > > enum sra_mode {SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ > > SRA_MODE_INTRA}; /* late intraprocedural SRA */ > > Spaces after { and before }. > OK > > /* Global variable describing which aggregate reduction we are performing at > > the moment. */ > > static enum sra_mode sra_mode; > > > > struct assign_link; > > > > /* ACCESS represents each access to an aggregate variable (as a whole or a > > part). It can also represent a group of accesses that refer to exactly the > > same fragment of an aggregate (i.e. those that have exactly the same offset > > and size). Such representatives for a single aggregate, once determined, > > are linked in a linked list and have the group fields set. > > > > Moreover, when doing intraprocedural SRA, a tree is built from those > > representatives (by the means of first_child and next_sibling pointers), in > > which all items in a subtree are "within" the root, i.e. their offset is > > greater or equal to offset of the root and offset+size is smaller or equal > > to offset+size of the root. Children of an access are sorted by offset. > > */ > > */ to previous line. OK > > struct access > > { > > /* Values returned by `get_ref_base_and_extent' for each COMPONENT_REF > > If EXPR isn't a COMPONENT_REF just set `BASE = EXPR', `OFFSET = 0', > > `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ > > s/COMPONENT_REF/component reference/g - it's not only COMPONENT_REF > trees we handle. OK > > HOST_WIDE_INT offset; > > HOST_WIDE_INT size; > > tree base; > > > > /* Expression. */ > > tree expr; > > /* Type. */ > > tree type; > > > > /* Next group representative for this aggregate. */ > > struct access *next_grp; > > > > /* Pointer to the group representative. Pointer to itself if the struct is > > the representative. */ > > struct access *group_representative; > > > > /* If this access has any children (in terms of the definition above), this > > points to the first one. */ > > struct access *first_child; > > > > /* Pointer to the next sibling in the access tree as described above. */ > > struct access *next_sibling; > > > > /* Pointers to the first and last element in the linked list of assign > > links. */ > > struct assign_link *first_link, *last_link; > > vertical space missing. OK > > /* Pointer to the next access in the work queue. */ > > struct access *next_queued; > > > > /* Replacement variable for this access "region." Never to be accessed > > directly, always only by the means of get_access_replacement() and only > > when grp_to_be_replaced flag is set. */ > > tree replacement_decl; > > > > /* Is this particular access write access? */ > > unsigned write : 1; > > > > /* Is this access currently in the work queue? */ > > unsigned grp_queued : 1; > > /* Does this group contain a write access? This flag is propagated down the > > access tree. */ > > unsigned grp_write : 1; > > /* Does this group contain a read access? This flag is propagated down the > > access tree. */ > > unsigned grp_read : 1; > > /* Is the subtree rooted in this access fully covered by scalar > > replacements? */ > > unsigned grp_covered : 1; > > /* If set to true, this access and all below it in an access tree must not be > > scalarized. */ > > unsigned grp_unscalarizable_region : 1; > > /* Whether data have been written to parts of the aggregate covered by this > > access which is not to be scalarized. This flag is propagated up in the > > access tree. */ > > unsigned grp_unscalarized_data : 1; > > /* Does this access and/or group contain a write access through a > > BIT_FIELD_REF? */ > > unsigned grp_bfr_lhs : 1; > > > > /* Set when a scalar replacement should be created for this variable. We do > > the decision and creation at different places because create_tmp_var > > cannot be called from within FOR_EACH_REFERENCED_VAR. */ > > unsigned grp_to_be_replaced : 1; > > }; > > > > typedef struct access *access_p; > > > > DEF_VEC_P (access_p); > > DEF_VEC_ALLOC_P (access_p, heap); > > > > /* Alloc pool for allocating access structures. */ > > static alloc_pool access_pool; > > > > /* A structure linking lhs and rhs accesses from an aggregate assignment. They > > are used to propagate subaccesses from rhs to lhs as long as they don't > > conflict with what is already there. */ > > struct assign_link > > { > > struct access *lacc, *racc; > > struct assign_link *next; > > }; > > > > /* Alloc pool for allocating assign link structures. */ > > static alloc_pool link_pool; > > > > /* Base (tree) -> Vector (VEC(access_p,heap) *) map. */ > > static struct pointer_map_t *base_access_vec; > > > > /* Bitmap of bases (candidates). */ > > static bitmap candidate_bitmap; > > /* Bitmap of declarations used in a return statement. */ > > static bitmap retvals_bitmap; > > /* Obstack for creation of fancy names. */ > > static struct obstack name_obstack; > > > > /* Head of a linked list of accesses that need to have its subaccesses > > propagated to their assignment counterparts. */ > > static struct access *work_queue_head; > > > > /* Dump contents of ACCESS to file F in a human friendly way. If GRP is true, > > representative fields are dumped, otherwise those which only describe the > > individual access are. */ > > > > static void > > dump_access (FILE *f, struct access *access, bool grp) > > { > > fprintf (f, "access { "); > > fprintf (f, "base = (%d)'", DECL_UID (access->base)); > > print_generic_expr (f, access->base, 0); > > fprintf (f, "', offset = %d", (int) access->offset); > > fprintf (f, ", size = %d", (int) access->size); > > you can use ", offset = "HOST_WIDE_INT_PRINT_DEC, access->offset here. > OK > > fprintf (f, ", expr = "); > > print_generic_expr (f, access->expr, 0); > > fprintf (f, ", type = "); > > print_generic_expr (f, access->type, 0); > > if (grp) > > fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, " > > "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " > > "grp_to_be_replaced = %d\n", > > access->grp_write, access->grp_read, access->grp_covered, > > access->grp_unscalarizable_region, access->grp_unscalarized_data, > > access->grp_to_be_replaced); > > else > > fprintf (f, ", write = %d'\n", access->write); > > } > > > > /* Dump a subtree rooted in ACCESS to file F, indent by LEVEL. */ > > > > static void > > dump_access_tree_1 (FILE *f, struct access *access, int level) > > { > > do > > { > > int i; > > > > for (i = 0; i < level; i++) > > fputs ("* ", dump_file); > > > > dump_access (f, access, true); > > > > if (access->first_child) > > dump_access_tree_1 (f, access->first_child, level + 1); > > > > access = access->next_sibling; > > } > > while (access); > > } > > > > /* Dump all access trees for a variable, given the pointer to the first root in > > ACCESS. */ > > > > static void > > dump_access_tree (FILE *f, struct access *access) > > { > > for (; access; access = access->next_grp) > > dump_access_tree_1 (f, access, 0); > > } > > > > /* Return a vector of pointers to accesses for the variable given in BASE or > > NULL if there is none. */ > > > > static VEC (access_p, heap) * > > get_base_access_vector (tree base) > > { > > void **slot; > > > > slot = pointer_map_contains (base_access_vec, base); > > if (!slot) > > return NULL; > > else > > return *(VEC (access_p, heap) **) slot; > > } > > > > /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted > > in ACCESS. Return NULL if it cannot be found. */ > > > > static struct access * > > find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, > > HOST_WIDE_INT size) > > { > > while (access && (access->offset != offset || access->size != size)) > > { > > struct access *child = access->first_child; > > > > while (child && (child->offset + child->size <= offset)) > > child = child->next_sibling; > > Do you limit the number of siblings? We should keep an eye on this > for potential compile-time issues (not that I expect some). > Not artificially and given how I construct them (from representatives), I doubt that it is easily possible. They are limited by the size and the structure of the aggregate though. If ever we come across a problem like this, we might disable scalarization of grossly huge aggregates. > > access = child; > > } > > > > return access; > > } > > > > /* Return the first group representative for DECL or NULL if none exists. */ > > > > static struct access * > > get_first_repr_for_decl (tree base) > > { > > VEC (access_p, heap) *access_vec; > > > > access_vec = get_base_access_vector (base); > > if (!access_vec) > > return NULL; > > > > return VEC_index (access_p, access_vec, 0); > > } > > > > /* Find an access representative for the variable BASE and given OFFSET and > > SIZE. Requires that access trees have already been built. Return NULL if > > it cannot be found. */ > > > > static struct access * > > get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, > > HOST_WIDE_INT size) > > { > > struct access *access; > > > > access = get_first_repr_for_decl (base); > > while (access && (access->offset + access->size <= offset)) > > access = access->next_grp; > > if (!access) > > return NULL; > > > > return find_access_in_subtree (access, offset, size); > > } > > > > /* Add LINK to the linked list of assign links of RACC. */ > > static void > > add_link_to_rhs (struct access *racc, struct assign_link *link) > > { > > gcc_assert (link->racc == racc); > > > > if (!racc->first_link) > > { > > gcc_assert (!racc->last_link); > > racc->first_link = link; > > } > > else > > racc->last_link->next = link; > > > > racc->last_link = link; > > link->next = NULL; > > } > > > > /* Move all link structures in their linked list in OLD_RACC to the linked list > > in NEW_RACC. */ > > static void > > relink_to_new_repr (struct access *new_racc, struct access *old_racc) > > { > > if (!old_racc->first_link) > > { > > gcc_assert (!old_racc->last_link); > > return; > > } > > > > if (new_racc->first_link) > > { > > gcc_assert (!new_racc->last_link->next); > > gcc_assert (!old_racc->last_link || !old_racc->last_link->next); > > > > new_racc->last_link->next = old_racc->first_link; > > new_racc->last_link = old_racc->last_link; > > } > > else > > { > > gcc_assert (!new_racc->last_link); > > > > new_racc->first_link = old_racc->first_link; > > new_racc->last_link = old_racc->last_link; > > } > > old_racc->first_link = old_racc->last_link = NULL; > > } > > > > /* Add ACCESS to the work queue (which is actually a stack). */ > > > > static void > > add_access_to_work_queue (struct access *access) > > { > > if (!access->grp_queued) > > { > > gcc_assert (!access->next_queued); > > access->next_queued = work_queue_head; > > access->grp_queued = 1; > > work_queue_head = access; > > } > > } > > > > /* Pop an access from the work queue, and return it, assuming there is one. */ > > > > static struct access * > > pop_access_from_work_queue (void) > > { > > struct access *access = work_queue_head; > > > > work_queue_head = access->next_queued; > > access->next_queued = NULL; > > access->grp_queued = 0; > > return access; > > } > > > > > > /* Allocate necessary structures. */ > > > > static void > > sra_initialize (void) > > { > > candidate_bitmap = BITMAP_ALLOC (NULL); > > retvals_bitmap = BITMAP_ALLOC (NULL); > > gcc_obstack_init (&name_obstack); > > access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16); > > link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16); > > base_access_vec = pointer_map_create (); > > } > > > > /* Hook fed to pointer_map_traverse, deallocate stored vectors. */ > > > > static bool > > delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value, > > void *data ATTRIBUTE_UNUSED) > > { > > VEC (access_p, heap) *access_vec; > > access_vec = (VEC (access_p, heap) *) *value; > > VEC_free (access_p, heap, access_vec); > > > > return true; > > } > > > > /* Deallocate all general structures. */ > > > > static void > > sra_deinitialize (void) > > { > > BITMAP_FREE (candidate_bitmap); > > BITMAP_FREE (retvals_bitmap); > > free_alloc_pool (access_pool); > > free_alloc_pool (link_pool); > > obstack_free (&name_obstack, NULL); > > > > pointer_map_traverse (base_access_vec, delete_base_accesses, NULL); > > pointer_map_destroy (base_access_vec); > > } > > > > /* Remove DECL from candidates for SRA and write REASON to the dump file if > > there is one. */ > > static void > > disqualify_candidate (tree decl, const char *reason) > > { > > bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)); > > > > if (dump_file) > > && (dump_flags & TDF_DETAILS) OK > > { > > fprintf (dump_file, "! Disqualifying "); > > print_generic_expr (dump_file, decl, 0); > > fprintf (dump_file, " - %s\n", reason); > > } > > } > > > > /* Return true iff the type contains a field or an element which does not allow > > scalarization. */ > > > > static bool > > type_internals_preclude_sra_p (tree type) > > { > > tree fld; > > tree et; > > > > switch (TREE_CODE (type)) > > { > > case RECORD_TYPE: > > case UNION_TYPE: > > case QUAL_UNION_TYPE: > > for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) > > if (TREE_CODE (fld) == FIELD_DECL) > > { > > tree ft = TREE_TYPE (fld); > > > > if (TREE_THIS_VOLATILE (fld) > > || !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld) > > || !host_integerp (DECL_FIELD_OFFSET (fld), 1) > > || !host_integerp (DECL_SIZE (fld), 1)) > > return true; > > > > if (AGGREGATE_TYPE_P (ft) > > && type_internals_preclude_sra_p (ft)) > > return true; > > } > > > > return false; > > > > case ARRAY_TYPE: > > et = TREE_TYPE (type); > > > > if (AGGREGATE_TYPE_P (et)) > > return type_internals_preclude_sra_p (et); > > else > > return false; > > > > default: > > return false; > > } > > } > > > > /* Create and insert access for EXPR. Return created access, or NULL if it is > > not possible. */ > > > > static struct access * > > create_access (tree expr, bool write) > > { > > struct access *access; > > void **slot; > > VEC (access_p,heap) *vec; > > HOST_WIDE_INT offset, size, max_size; > > tree base = expr; > > bool unscalarizable_region = false; > > > > if (handled_component_p (expr)) > > base = get_ref_base_and_extent (expr, &offset, &size, &max_size); > > else > > { > > tree tree_size; > > > > tree_size = TYPE_SIZE (TREE_TYPE (base)); > > if (tree_size && host_integerp (tree_size, 1)) > > size = max_size = tree_low_cst (tree_size, 1); > > else > > size = max_size = -1; > > > > offset = 0; > > } > > get_ref_base_and_extent should now also work on plain DECLs > (non-handled_component_p) and base should never be NULL. Good point. (However, I seem to remember I put the !base check there for a reason... it might have been IPA-SRA though. I'll put an assert there for the next few days at least.) > > if (!base || !DECL_P (base) > > || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) > > return NULL; > > > > if (size != max_size) > > { > > size = max_size; > > unscalarizable_region = true; > > } > > > > if (size < 0) > > { > > disqualify_candidate (base, "Encountered an ultra variable sized " > > "access."); > > ultra variable sized? I would name it 'unconstrained access'. Well, IPA-SRA does not even like normal variable sized :-) But yes, I'll change the reason string. > Note that there is still useful information in this case if offset > is non-zero (namely accesses before [offset, -1] may still be > scalarized). Maybe something for further improvements. This for > example would happen for structs with trailing arrays. Yes, we could make an unscalarizable region from it. It's now on my TODO list. > > return NULL; > > } > > > > access = (struct access *) pool_alloc (access_pool); > > memset (access, 0, sizeof (struct access)); > > > > access->base = base; > > access->offset = offset; > > access->size = size; > > access->expr = expr; > > access->type = TREE_TYPE (expr); > > access->write = write; > > access->grp_unscalarizable_region = unscalarizable_region; > > > > slot = pointer_map_contains (base_access_vec, base); > > if (slot) > > vec = (VEC (access_p, heap) *) *slot; > > else > > vec = VEC_alloc (access_p, heap, 32); > > > > VEC_safe_push (access_p, heap, vec, access); > > > > *((struct VEC (access_p,heap) **) > > pointer_map_insert (base_access_vec, base)) = vec; > > > > return access; > > } > > > > > > /* Callback of walk_tree. Search the given tree for a declaration and exclude > > it from the candidates. */ > > > > static tree > > disqualify_all (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) > > { > > tree base = *tp; > > > > > > if (TREE_CODE (base) == SSA_NAME) > > base = SSA_NAME_VAR (base); > > Err ... for SSA_NAME bases there is nothing to scalarize? So just > bail out in that case? You are right, this is also a leftover from IPA-SRA. > In fact, using walk_tree for disqualify_all looks > a bit expensive (it also walks types). OK, it seems quite possible to get rid of it and use a simple dive through handled_components where still necessary. > > if (DECL_P (base)) > > { > > disqualify_candidate (base, "From within disqualify_all()."); > > *walk_subtrees = 0; > > } > > else > > *walk_subtrees = 1; > > > > > > return NULL_TREE; > > } > > > > /* Scan expression EXPR and create access structures for all accesses to > > candidates for scalarization. Return the created access or NULL if none is > > created. */ > > > > static struct access * > > build_access_from_expr_1 (tree *expr_ptr, > > gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write) > > { > > struct access *ret = NULL; > > tree expr = *expr_ptr; > > tree safe_expr = expr; > > bool bit_ref; > > > > if (TREE_CODE (expr) == BIT_FIELD_REF) > > { > > expr = TREE_OPERAND (expr, 0); > > bit_ref = true; > > } > > else > > bit_ref = false; > > > > while (TREE_CODE (expr) == NOP_EXPR > > CONVERT_EXPR_P (expr) OK... but at another place in the email you said it might not even appear in a valid gimple statement? Should I remove it altogether? > > || TREE_CODE (expr) == VIEW_CONVERT_EXPR > > || TREE_CODE (expr) == REALPART_EXPR > > || TREE_CODE (expr) == IMAGPART_EXPR) > > expr = TREE_OPERAND (expr, 0); > > Why do this here btw, and not just lump ... > > > switch (TREE_CODE (expr)) > > { > > case ADDR_EXPR: > > case SSA_NAME: > > case INDIRECT_REF: > > break; > > > > case VAR_DECL: > > case PARM_DECL: > > case RESULT_DECL: > > case COMPONENT_REF: > > case ARRAY_REF: > > ret = create_access (expr, write); > > break; > > ... this ... > > > case REALPART_EXPR: > > case IMAGPART_EXPR: > > expr = TREE_OPERAND (expr, 0); > > ret = create_access (expr, write); > > ... and this together? Won't you create bogus accesses if you > strip for example IMAGPART_EXPR (which has non-zero offset)? That would break the complex number into its components. I thought that they are meant to stay together for some reason, otherwise they would not be represented explicitly in gimple... do you think it does not matter? What about vectors then? The access is not bogus because modification functions take care of these statements in a special way. However, if it is indeed OK to split complex numbers into their components, I will gladly simplify this as you suggested. > > break; > > > > case ARRAY_RANGE_REF: > > it should just be handled fine I think. OK, I will try that at some later stage. > > default: > > walk_tree (&safe_expr, disqualify_all, NULL, NULL); > > and if not, this should just disqualify the base of the access, like > get_base_address (safe_expr) (save_expr you mean?) and then if that > is a DECL, disqualify that decl. I'll test just doing nothing, things handled by get_base_address are either fine or already accounted for. > > break; > > } > > > > if (write && bit_ref && ret) > > ret->grp_bfr_lhs = 1; > > > > return ret; > > } > > > > /* Scan expression EXPR and create access structures for all accesses to > > candidates for scalarization. Return true if any access has been > > inserted. */ > > > > static bool > > build_access_from_expr (tree *expr_ptr, > > gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write, > > void *data ATTRIBUTE_UNUSED) > > { > > return build_access_from_expr_1 (expr_ptr, gsi, write) != NULL; > > } > > > > /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in > > modes in which it matters, return true iff they have been disqualified. RHS > > may be NULL, in that case ignore it. If we scalarize an aggregate in > > intra-SRA we may need to add statements after each statement. This is not > > possible if a statement unconditionally has to end the basic block. */ > > static bool > > disqualify_ops_if_throwing_stmt (gimple stmt, tree *lhs, tree *rhs) > > { > > if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)) > > { > > walk_tree (lhs, disqualify_all, NULL, NULL); > > if (rhs) > > walk_tree (rhs, disqualify_all, NULL, NULL); > > return true; > > } > > return false; > > } > > > > > > /* Result code for scan_assign callback for scan_function. */ > > enum scan_assign_result {SRA_SA_NONE, /* nothing done for the stmt */ > > SRA_SA_PROCESSED, /* stmt analyzed/changed */ > > SRA_SA_REMOVED}; /* stmt redundant and eliminated */ > > space after { and before }. OK > > > > > > /* Scan expressions occuring in the statement pointed to by STMT_EXPR, create > > access structures for all accesses to candidates for scalarization and > > remove those candidates which occur in statements or expressions that > > prevent them from being split apart. Return true if any access has been > > inserted. */ > > > > static enum scan_assign_result > > build_accesses_from_assign (gimple *stmt_ptr, > > gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, > > void *data ATTRIBUTE_UNUSED) > > { > > gimple stmt = *stmt_ptr; > > tree *lhs_ptr, *rhs_ptr; > > struct access *lacc, *racc; > > > > if (gimple_assign_rhs2 (stmt)) > > !gimple_assign_single_p (stmt) > > > return SRA_SA_NONE; > > > > lhs_ptr = gimple_assign_lhs_ptr (stmt); > > rhs_ptr = gimple_assign_rhs1_ptr (stmt); > > you probably don't need to pass pointers to trees everywhere as you > are not changing them. Well, this function is a callback called by scan_function which can also call sra_modify_expr in the last stage of the pass when statements are modified. I have considered splitting the function into two but in the end I thought they would be too similar and the overhead is hopefully manageable. > > if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr)) > > return SRA_SA_NONE; > > > > racc = build_access_from_expr_1 (rhs_ptr, gsi, false); > > lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); > > just avoid calling into build_access_from_expr_1 for SSA_NAMEs > or is_gimple_min_invariant lhs/rhs, that should make that > function more regular. In what sense? build_access_from_expr_1 looks at TREE_CODE anyway and can discard the two cases, without for example looking into ADR_EXPRs like is_gimple_min_invariant(). But if you really think it is indeed beneficial, I can do that, sure - to me it just looks ugly). > > if (lacc && racc > > && !lacc->grp_unscalarizable_region > > && !racc->grp_unscalarizable_region > > && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) > > && lacc->size <= racc->size > > && useless_type_conversion_p (lacc->type, racc->type)) > > useless_type_conversion_p should be always true here. I don't think so, build_access_from_expr_1 can look through V_C_Es and the types of accesses are the type of the operand in such cases.. > > { > > struct assign_link *link; > > > > link = (struct assign_link *) pool_alloc (link_pool); > > memset (link, 0, sizeof (struct assign_link)); > > > > link->lacc = lacc; > > link->racc = racc; > > > > add_link_to_rhs (racc, link); > > } > > > > return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE; > > } > > > > /* Scan function and look for interesting statements. Return true if any has > > been found or processed, as indicated by callbacks. SCAN_EXPR is a callback > > called on all expressions within statements except assign statements and > > those deemed entirely unsuitable for some reason (all operands in such > > statements and expression are removed from candidate_bitmap). SCAN_ASSIGN > > is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback > > called on assign statements and those call statements which have a lhs and > > it is the only callback which can be NULL. ANALYSIS_STAGE is true when > > running in the analysis stage of a pass and thus no statement is being > > modified. DATA is a pointer passed to all callbacks. If any single > > callback returns true, this function also returns true, otherwise it returns > > false. */ > > > > static bool > > scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *), > > enum scan_assign_result (*scan_assign) (gimple *, > > gimple_stmt_iterator *, > > void *), > > bool (*handle_ssa_defs)(gimple, void *), > > bool analysis_stage, void *data) > > { > > gimple_stmt_iterator gsi; > > basic_block bb; > > unsigned i; > > tree *t; > > bool ret = false; > > > > FOR_EACH_BB (bb) > > { > > bool bb_changed = false; > > > > gsi = gsi_start_bb (bb); > > while (!gsi_end_p (gsi)) > > { > > gimple stmt = gsi_stmt (gsi); > > enum scan_assign_result assign_result; > > bool any = false, deleted = false; > > > > switch (gimple_code (stmt)) > > { > > case GIMPLE_RETURN: > > t = gimple_return_retval_ptr (stmt); > > if (*t != NULL_TREE) > > { > > if (DECL_P (*t)) > > { > > tree ret_type = TREE_TYPE (*t); > > if (sra_mode == SRA_MODE_EARLY_INTRA > > && (TREE_CODE (ret_type) == UNION_TYPE > > || TREE_CODE (ret_type) == QUAL_UNION_TYPE)) > > disqualify_candidate (*t, > > "Union in a return statement."); > > else > > bitmap_set_bit (retvals_bitmap, DECL_UID (*t)); > > } > > any |= scan_expr (t, &gsi, false, data); > > } > > Likewise for passing pointers (why is gsi necessary and passing a stmt > does not work?) So that scan_expr can insert V_C_E statements before or after the statement. I'll explain more when commenting on the fix_incompatible_types_for_expr() function. > > break; > > > > case GIMPLE_ASSIGN: > > assign_result = scan_assign (&stmt, &gsi, data); > > any |= assign_result == SRA_SA_PROCESSED; > > deleted = assign_result == SRA_SA_REMOVED; > > if (handle_ssa_defs && assign_result != SRA_SA_REMOVED) > > any |= handle_ssa_defs (stmt, data); > > break; > > > > case GIMPLE_CALL: > > /* Operands must be processed before the lhs. */ > > for (i = 0; i < gimple_call_num_args (stmt); i++) > > { > > tree *argp = gimple_call_arg_ptr (stmt, i); > > any |= scan_expr (argp, &gsi, false, data); > > } > > > > if (gimple_call_lhs (stmt)) > > { > > tree *lhs_ptr = gimple_call_lhs_ptr (stmt); > > if (!analysis_stage || > > !disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, NULL)) > > { > > any |= scan_expr (lhs_ptr, &gsi, true, data); > > if (handle_ssa_defs) > > any |= handle_ssa_defs (stmt, data); > > } > > } > > break; > > > > case GIMPLE_ASM: > > for (i = 0; i < gimple_asm_ninputs (stmt); i++) > > { > > tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i)); > > any |= scan_expr (op, &gsi, false, data); > > } > > for (i = 0; i < gimple_asm_noutputs (stmt); i++) > > { > > tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i)); > > any |= scan_expr (op, &gsi, true, data); > > } > > asm operands with memory constraints should be disqualified from SRA > (see walk_stmt_load_store_addr_ops and/or the operand scanner). Hm, identifying those with that constraint seems quite complicated and scan_function is enough complex as it is. I guess I'll use walk_stmt_load_store_addr_ops to determine this for me then. > > default: > > if (analysis_stage) > > walk_gimple_op (stmt, disqualify_all, NULL); > > You seem to be very eager to disqualify anything unknown ;) (But I > yet have to come across a TREE_ADDRESSABLE check ...) It is in find_var_candidates. > > break; > > } > > > > if (any) > > { > > ret = true; > > bb_changed = true; > > > > if (!analysis_stage) > > Oh. So we reuse this function. Hmm. Yes. And with IPA-SRA it's reused again with another few special cases. As I have written above, I am no longer sure this is a good idea but so far it's been manageable and it has a few advantages too. > > { > > update_stmt (stmt); > > if (!stmt_could_throw_p (stmt)) > > remove_stmt_from_eh_region (stmt); > > Usually > > if (maybe_clean_or_replace_eh_stmt (stmt, stmt) > && gimple_purge_dead_eh_edges (gimple_bb (stmt))) > > is the pattern for this. But then you disqualified all throwing > expressions, no? I'll experiment with removing this. So far it's on my TODO list so that it does not hold me up sending an updated version to you. > > } > > } > > if (deleted) > > bb_changed = true; > > else > > { > > gsi_next (&gsi); > > ret = true; > > } > > } > > if (!analysis_stage && bb_changed) > > gimple_purge_dead_eh_edges (bb); > > } > > > > return ret; > > } > > > > /* Helper of QSORT function. There are pointers to accesses in the array. An > > access is considered smaller than another if it has smaller offset or if the > > offsets are the same but is size is bigger. */ > > > > static int > > compare_access_positions (const void *a, const void *b) > > { > > const access_p *fp1 = (const access_p *) a; > > const access_p *fp2 = (const access_p *) b; > > const access_p f1 = *fp1; > > const access_p f2 = *fp2; > > > > if (f1->offset != f2->offset) > > return f1->offset < f2->offset ? -1 : 1; > > > > if (f1->size == f2->size) > > return 0; > > /* We want the bigger accesses first, thus the opposite operator in the next > > line: */ > > return f1->size > f2->size ? -1 : 1; > > } > > > > > > /* Append a name of the declaration to the name obstack. A helper function for > > make_fancy_name. */ > > > > static void > > make_fancy_decl_name (tree decl) > > { > > char buffer[32]; > > > > tree name = DECL_NAME (decl); > > if (name) > > obstack_grow (&name_obstack, IDENTIFIER_POINTER (name), > > IDENTIFIER_LENGTH (name)); > > else > > { > > sprintf (buffer, "D%u", DECL_UID (decl)); > > That would just be useless information. I guess you copied this > from old SRA? Yes. All this fancy naming stuff is quite useless but I find it very handy when debugging SRA issues. > > obstack_grow (&name_obstack, buffer, strlen (buffer)); > > } > > } > > > > /* Helper for make_fancy_name. */ > > > > static void > > make_fancy_name_1 (tree expr) > > { > > char buffer[32]; > > tree index; > > > > if (DECL_P (expr)) > > { > > make_fancy_decl_name (expr); > > return; > > } > > > > switch (TREE_CODE (expr)) > > { > > case COMPONENT_REF: > > make_fancy_name_1 (TREE_OPERAND (expr, 0)); > > obstack_1grow (&name_obstack, '$'); > > make_fancy_decl_name (TREE_OPERAND (expr, 1)); > > break; > > > > case ARRAY_REF: > > make_fancy_name_1 (TREE_OPERAND (expr, 0)); > > obstack_1grow (&name_obstack, '$'); > > /* Arrays with only one element may not have a constant as their > > index. */ > > index = TREE_OPERAND (expr, 1); > > if (TREE_CODE (index) != INTEGER_CST) > > break; > > sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index)); > > obstack_grow (&name_obstack, buffer, strlen (buffer)); > > > > break; > > > > case BIT_FIELD_REF: > > case REALPART_EXPR: > > case IMAGPART_EXPR: > > gcc_unreachable (); /* we treat these as scalars. */ > > break; > > default: > > break; > > } > > } > > > > /* Create a human readable name for replacement variable of ACCESS. */ > > > > static char * > > make_fancy_name (tree expr) > > { > > make_fancy_name_1 (expr); > > obstack_1grow (&name_obstack, '\0'); > > return XOBFINISH (&name_obstack, char *); > > } > > As all new scalars are DECL_ARTIFICIAL anyway why bother to create > a fancy name? .... Well, it helps when reading dumps in order to look into what SRA did. > > /* Helper function for build_ref_for_offset. */ > > > > static bool > > build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset, > > tree exp_type) > > { > > while (1) > > { > > tree fld; > > tree tr_size, index; > > HOST_WIDE_INT el_size; > > > > if (offset == 0 && exp_type > > && useless_type_conversion_p (exp_type, type)) > > return true; > > > > switch (TREE_CODE (type)) > > { > > case UNION_TYPE: > > case QUAL_UNION_TYPE: > > case RECORD_TYPE: > > /* Some ADA records are half-unions, treat all of them the same. */ > > for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) > > { > > HOST_WIDE_INT pos, size; > > tree expr, *expr_ptr; > > > > if (TREE_CODE (fld) != FIELD_DECL) > > continue; > > > > pos = int_bit_position (fld); > > gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0); > > size = tree_low_cst (DECL_SIZE (fld), 1); > > if (pos > offset || (pos + size) <= offset) > > continue; > > > > if (res) > > { > > expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld, > > NULL_TREE); > > expr_ptr = &expr; > > } > > else > > expr_ptr = NULL; > > if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld), > > offset - pos, exp_type)) > > { > > if (res) > > *res = expr; > > return true; > > } > > } > > return false; > > > > case ARRAY_TYPE: > > tr_size = TYPE_SIZE (TREE_TYPE (type)); > > if (!tr_size || !host_integerp (tr_size, 1)) > > return false; > > el_size = tree_low_cst (tr_size, 1); > > > > index = build_int_cst (TYPE_DOMAIN (type), offset / el_size); > > if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) > > index = int_const_binop (PLUS_EXPR, index, > > TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0); > > if (res) > > *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE, > > NULL_TREE); > > offset = offset % el_size; > > type = TREE_TYPE (type); > > break; > > > > default: > > if (offset != 0) > > return false; > > > > if (exp_type) > > return false; > > else > > return true; > > } > > } > > } > > > > /* Construct an expression that would reference a part of aggregate *EXPR of > > type TYPE at the given OFFSET of the type EXP_TYPE. If EXPR is NULL, the > > function only determines whether it can build such a reference without > > actually doing it. > > > > FIXME: Eventually this should be replaced with > > maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a > > minor rewrite of fold_stmt. > > */ > > > > static bool > > build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset, > > tree exp_type, bool allow_ptr) > > { > > if (allow_ptr && POINTER_TYPE_P (type)) > > { > > type = TREE_TYPE (type); > > if (expr) > > *expr = fold_build1 (INDIRECT_REF, type, *expr); > > } > > > > return build_ref_for_offset_1 (expr, type, offset, exp_type); > > } > > > > /* The very first phase of intraprocedural SRA. It marks in candidate_bitmap > > those with type which is suitable for scalarization. */ > > > > static bool > > find_var_candidates (void) > > { > > tree var, type; > > referenced_var_iterator rvi; > > bool ret = false; > > > > FOR_EACH_REFERENCED_VAR (var, rvi) > > { > > if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL) > > continue; > > type = TREE_TYPE (var); > > > > if (!AGGREGATE_TYPE_P (type) > > || needs_to_live_in_memory (var) > > Ok, here's the TREE_ADDRESSABLE check. I'm finally convinced that > disqualify_all should go ;) OK > > || TREE_THIS_VOLATILE (var) > > || !COMPLETE_TYPE_P (type) > > || !host_integerp (TYPE_SIZE (type), 1) > > || tree_low_cst (TYPE_SIZE (type), 1) == 0 > > || type_internals_preclude_sra_p (type)) > > continue; > > > > bitmap_set_bit (candidate_bitmap, DECL_UID (var)); > > > > if (dump_file) > > && (dump_flags & TDF_DETAILS) OK > > { > > fprintf (dump_file, "Candidate (%d): ", DECL_UID (var)); > > print_generic_expr (dump_file, var, 0); > > fprintf (dump_file, "\n"); > > } > > ret = true; > > } > > > > return ret; > > } > > > > /* Return true if TYPE should be considered a scalar type by SRA. */ > > > > static bool > > is_sra_scalar_type (tree type) > > { > > enum tree_code code = TREE_CODE (type); > > return (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type) > > || FIXED_POINT_TYPE_P (type) || POINTER_TYPE_P (type) > > || code == VECTOR_TYPE || code == COMPLEX_TYPE > > || code == OFFSET_TYPE); > > } > > Why is this anything different from is_gimple_reg_type ()? The old SRA had a function like this and I kept it. But it makes no sense, I've removed it and replaced calls to it with is_gimple_reg_type(). > > /* Sort all accesses for the given variable, check for partial overlaps and > > return NULL if there are any. If there are none, pick a representative for > > each combination of offset and size and create a linked list out of them. > > Return the pointer to the first representative and make sure it is the first > > one in the vector of accesses. */ > > > > static struct access * > > sort_and_splice_var_accesses (tree var) > > { > > int i, j, access_count; > > struct access *res, **prev_acc_ptr = &res; > > VEC (access_p, heap) *access_vec; > > bool first = true; > > HOST_WIDE_INT low = -1, high = 0; > > > > access_vec = get_base_access_vector (var); > > if (!access_vec) > > return NULL; > > access_count = VEC_length (access_p, access_vec); > > > > /* Sort by <OFFSET, SIZE>. */ > > qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p), > > compare_access_positions); > > > > i = 0; > > while (i < access_count) > > { > > struct access *access = VEC_index (access_p, access_vec, i); > > bool modification = access->write; > > bool grp_read = !access->write; > > bool grp_bfr_lhs = access->grp_bfr_lhs; > > bool first_scalar = is_sra_scalar_type (access->type); > > bool unscalarizable_region = access->grp_unscalarizable_region; > > > > if (first || access->offset >= high) > > { > > first = false; > > low = access->offset; > > high = access->offset + access->size; > > } > > else if (access->offset > low && access->offset + access->size > high) > > return NULL; > > else > > gcc_assert (access->offset >= low > > && access->offset + access->size <= high); > > > > j = i + 1; > > while (j < access_count) > > { > > struct access *ac2 = VEC_index (access_p, access_vec, j); > > if (ac2->offset != access->offset || ac2->size != access->size) > > break; > > modification |= ac2->write; > > grp_read |= !ac2->write; > > grp_bfr_lhs |= ac2->grp_bfr_lhs; > > unscalarizable_region |= ac2->grp_unscalarizable_region; > > relink_to_new_repr (access, ac2); > > > > /* If one of the equivalent accesses is scalar, use it as a > > representative (this happens when when there is for example on a > > single scalar field in a structure). */ > > if (!first_scalar && is_sra_scalar_type (ac2->type)) > > { > > struct access tmp_acc; > > first_scalar = true; > > > > memcpy (&tmp_acc, ac2, sizeof (struct access)); > > memcpy (ac2, access, sizeof (struct access)); > > memcpy (access, &tmp_acc, sizeof (struct access)); > > } > > ac2->group_representative = access; > > j++; > > } > > > > i = j; > > > > access->group_representative = access; > > access->grp_write = modification; > > access->grp_read = grp_read; > > access->grp_bfr_lhs = grp_bfr_lhs; > > access->grp_unscalarizable_region = unscalarizable_region; > > if (access->first_link) > > add_access_to_work_queue (access); > > > > *prev_acc_ptr = access; > > prev_acc_ptr = &access->next_grp; > > } > > > > gcc_assert (res == VEC_index (access_p, access_vec, 0)); > > return res; > > } > > > > /* Create a variable for the given ACCESS which determines the type, name and a > > few other properties. Return the variable declaration and store it also to > > ACCESS->replacement. */ > > > > static tree > > create_access_replacement (struct access *access) > > { > > tree repl; > > > > repl = make_rename_temp (access->type, "SR"); > > get_var_ann (repl); > > add_referenced_var (repl); > > > > DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); > > DECL_ARTIFICIAL (repl) = 1; > > > > if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) > > at least && !DECL_ARTIFICIAL (access->base) I think. This part is also largely copied from the old SRA. So far it seems to work nicely, replacements of artificial declarations get SRsome_number fancy names and that makes them easy to distinguish. Nevertheless, I can change the condition if it is somehow wrong. Or do you expect any other problems beside not-so-fancy fancy names? > > { > > char *pretty_name = make_fancy_name (access->expr); > > > > DECL_NAME (repl) = get_identifier (pretty_name); > > obstack_free (&name_obstack, pretty_name); > > > > SET_DECL_DEBUG_EXPR (repl, access->expr); > > DECL_DEBUG_EXPR_IS_FROM (repl) = 1; > > DECL_IGNORED_P (repl) = 0; > > TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base); > > } > > else > > { > > DECL_IGNORED_P (repl) = 1; > > TREE_NO_WARNING (repl) = 1; > > } > > So just copy DECL_IGNORED_P and TREE_NO_WARNING from access->base > unconditionally? Yeah, that makes sense. > > if (access->grp_bfr_lhs) > > DECL_GIMPLE_REG_P (repl) = 0; > > But you never set it (see update_address_taken for more cases, > most notably VIEW_CONVERT_EXPR on the lhs which need to be taken > care of). You should set it for COMPLEX_TYPE and VECTOR_TYPE > replacements. This function is the only place where I still use make_rename_temp which sets it exactly in these two cases. I did not really know why it is required in these two cases and only in these two cases so I left it there, at least for now. I guess I understand that now after seeing update_address_taken. I can replace this with calling create_tmp_var() and doing all the rest that make_rename_temp does - I believe that you intend to to remove it - I have just not found out why it is so bad. > > if (dump_file) > > { > > fprintf (dump_file, "Created a replacement for "); > > print_generic_expr (dump_file, access->base, 0); > > fprintf (dump_file, " offset: %u, size: %u: ", > > (unsigned) access->offset, (unsigned) access->size); > > print_generic_expr (dump_file, repl, 0); > > fprintf (dump_file, "\n"); > > } > > > > return repl; > > } > > > > /* Return ACCESS scalar replacement, create it if it does not exist yet. */ > > > > static inline tree > > get_access_replacement (struct access *access) > > { > > gcc_assert (access->grp_to_be_replaced); > > > > if (access->replacement_decl) > > return access->replacement_decl; > > > > access->replacement_decl = create_access_replacement (access); > > return access->replacement_decl; > > } > > > > /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the > > linked list along the way. Stop when *ACCESS is NULL or the access pointed > > to it is not "within" the root. */ > > > > static void > > build_access_subtree (struct access **access) > > { > > struct access *root = *access, *last_child = NULL; > > HOST_WIDE_INT limit = root->offset + root->size; > > > > *access = (*access)->next_grp; > > while (*access && (*access)->offset + (*access)->size <= limit) > > { > > if (!last_child) > > root->first_child = *access; > > else > > last_child->next_sibling = *access; > > last_child = *access; > > > > build_access_subtree (access); > > } > > } > > > > /* Build a tree of access representatives, ACCESS is the pointer to the first > > one, others are linked in a list by the next_grp field. Decide about scalar > > replacements on the way, return true iff any are to be created. */ > > > > static void > > build_access_trees (struct access *access) > > { > > while (access) > > { > > struct access *root = access; > > > > build_access_subtree (&access); > > root->next_grp = access; > > } > > } > > > > /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when > > both seeming beneficial and when ALLOW_REPLACEMENTS allows it. Also set > > all sorts of access flags appropriately along the way, notably always ser > > grp_read when MARK_READ is true and grp_write when MARK_WRITE is true. */ > > > > static bool > > analyze_access_subtree (struct access *root, bool allow_replacements, > > bool mark_read, bool mark_write) > > { > > struct access *child; > > HOST_WIDE_INT limit = root->offset + root->size; > > HOST_WIDE_INT covered_to = root->offset; > > bool scalar = is_sra_scalar_type (root->type); > > bool hole = false, sth_created = false; > > > > if (mark_read) > > root->grp_read = true; > > else if (root->grp_read) > > mark_read = true; > > > > if (mark_write) > > root->grp_write = true; > > else if (root->grp_write) > > mark_write = true; > > > > if (root->grp_unscalarizable_region) > > allow_replacements = false; > > > > for (child = root->first_child; child; child = child->next_sibling) > > { > > if (!hole && child->offset < covered_to) > > hole = true; > > else > > covered_to += child->size; > > > > sth_created |= analyze_access_subtree (child, > > allow_replacements && !scalar, > > mark_read, mark_write); > > > > root->grp_unscalarized_data |= child->grp_unscalarized_data; > > hole |= !child->grp_covered; > > } > > > > if (allow_replacements && scalar && !root->first_child) > > { > > if (dump_file) > > && (dump_flags & TDF_DETAILS) OK > > { > > fprintf (dump_file, "Marking "); > > print_generic_expr (dump_file, root->base, 0); > > fprintf (dump_file, " offset: %u, size: %u: ", > > (unsigned) root->offset, (unsigned) root->size); > > fprintf (dump_file, " to be replaced.\n"); > > } > > > > root->grp_to_be_replaced = 1; > > sth_created = true; > > hole = false; > > } > > else if (covered_to < limit) > > hole = true; > > > > if (sth_created && !hole) > > { > > root->grp_covered = 1; > > return true; > > } > > if (root->grp_write || TREE_CODE (root->base) == PARM_DECL) > > root->grp_unscalarized_data = 1; /* not covered and written to */ > > if (sth_created) > > return true; > > return false; > > } > > > > /* Analyze all access trees linked by next_grp by the means of > > analyze_access_subtree. */ > > static bool > > analyze_access_trees (struct access *access) > > { > > bool ret = false; > > > > while (access) > > { > > if (analyze_access_subtree (access, true, false, false)) > > ret = true; > > access = access->next_grp; > > } > > > > return ret; > > } > > > > /* Return true iff a potential new child of LACC at offset OFFSET and with size > > SIZE would conflict with an already existing one. If exactly such a child > > already exists in LACC, store a pointer to it in EXACT_MATCH. */ > > > > static bool > > child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, > > HOST_WIDE_INT size, struct access **exact_match) > > { > > struct access *child; > > > > for (child = lacc->first_child; child; child = child->next_sibling) > > { > > if (child->offset == norm_offset && child->size == size) > > { > > *exact_match = child; > > return true; > > } > > > > if (child->offset < norm_offset + size > > && child->offset + child->size > norm_offset) > > return true; > > } > > > > return false; > > } > > > > /* Set the expr of TARGET to one just like MODEL but with is own base at the > > bottom of the handled components. */ > > > > static void > > duplicate_expr_for_different_base (struct access *target, > > struct access *model) > > { > > tree t, expr = unshare_expr (model->expr); > > > > gcc_assert (handled_component_p (expr)); > > t = expr; > > while (handled_component_p (TREE_OPERAND (t, 0))) > > t = TREE_OPERAND (t, 0); > > gcc_assert (TREE_OPERAND (t, 0) == model->base); > > TREE_OPERAND (t, 0) = target->base; > > > > target->expr = expr; > > } > > > > > > /* Create a new child access of PARENT, with all properties just like MODEL > > except for its offset and with its grp_write false and grp_read true. > > Return the new access. Note that this access is created long after all > > splicing and sorting, it's not located in any access vector and is > > automatically a representative of its group. */ > > > > static struct access * > > create_artificial_child_access (struct access *parent, struct access *model, > > HOST_WIDE_INT new_offset) > > { > > struct access *access; > > struct access **child; > > > > gcc_assert (!model->grp_unscalarizable_region); > > > > access = (struct access *) pool_alloc (access_pool); > > memset (access, 0, sizeof (struct access)); > > access->base = parent->base; > > access->offset = new_offset; > > access->size = model->size; > > duplicate_expr_for_different_base (access, model); > > access->type = model->type; > > access->grp_write = true; > > access->grp_read = false; > > > > child = &parent->first_child; > > while (*child && (*child)->offset < new_offset) > > child = &(*child)->next_sibling; > > > > access->next_sibling = *child; > > *child = access; > > > > return access; > > } > > > > > > /* Propagate all subaccesses of RACC across an assignment link to LACC. Return > > true if any new subaccess was created. Additionally, if RACC is a scalar > > access but LACC is not, change the type of the latter. */ > > > > static bool > > propagate_subacesses_accross_link (struct access *lacc, struct access *racc) > > { > > struct access *rchild; > > HOST_WIDE_INT norm_delta = lacc->offset - racc->offset; > > bool ret = false; > > > > if (is_sra_scalar_type (lacc->type) > > || lacc->grp_unscalarizable_region > > || racc->grp_unscalarizable_region) > > return false; > > > > if (!lacc->first_child && !racc->first_child > > && is_sra_scalar_type (racc->type) > > && (sra_mode == SRA_MODE_INTRA > > || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base)))) > > { > > duplicate_expr_for_different_base (lacc, racc); > > lacc->type = racc->type; > > return false; > > } > > > > gcc_assert (lacc->size <= racc->size); > > > > for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling) > > { > > struct access *new_acc = NULL; > > HOST_WIDE_INT norm_offset = rchild->offset + norm_delta; > > > > if (rchild->grp_unscalarizable_region) > > continue; > > > > if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size, > > &new_acc)) > > { > > if (new_acc && rchild->first_child) > > ret |= propagate_subacesses_accross_link (new_acc, rchild); > > continue; > > } > > > > new_acc = create_artificial_child_access (lacc, rchild, norm_offset); > > if (racc->first_child) > > propagate_subacesses_accross_link (new_acc, rchild); > > > > ret = true; > > } > > > > return ret; > > } > > > > /* Propagate all subaccesses across assignment links. */ > > > > static void > > propagate_all_subaccesses (void) > > { > > while (work_queue_head) > > { > > struct access *racc = pop_access_from_work_queue (); > > struct assign_link *link; > > > > gcc_assert (racc->first_link); > > > > for (link = racc->first_link; link; link = link->next) > > { > > struct access *lacc = link->lacc; > > > > if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base))) > > continue; > > lacc = lacc->group_representative; > > if (propagate_subacesses_accross_link (lacc, racc) > > && lacc->first_link) > > add_access_to_work_queue (lacc); > > } > > } > > } > > > > /* Go through all accesses collected throughout the (intraprocedural) analysis > > stage, exclude overlapping ones, identify representatives and build trees > > out of them, making decisions about scalarization on the way. Return true > > iff there are any to-be-scalarized variables after this stage. */ > > > > static bool > > analyze_all_variable_accesses (void) > > { > > tree var; > > referenced_var_iterator rvi; > > bool res = false; > > > > FOR_EACH_REFERENCED_VAR (var, rvi) > > if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) > > { > > struct access *access; > > > > access = sort_and_splice_var_accesses (var); > > if (access) > > build_access_trees (access); > > else > > disqualify_candidate (var, > > "No or inhibitingly overlapping accesses."); > > } > > > > propagate_all_subaccesses (); > > > > FOR_EACH_REFERENCED_VAR (var, rvi) > > if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) > > { > > struct access *access = get_first_repr_for_decl (var); > > > > if (analyze_access_trees (access)) > > { > > res = true; > > if (dump_file) > > && (dump_flags & TDF_DETAILS) OK > > { > > fprintf (dump_file, "\nAccess trees for "); > > print_generic_expr (dump_file, var, 0); > > fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); > > dump_access_tree (dump_file, access); > > fprintf (dump_file, "\n"); > > } > > } > > else > > disqualify_candidate (var, "No scalar replacements to be created."); > > } > > > > return res; > > } > > > > /* Return true iff a reference statement into aggregate AGG can be built for > > every single to-be-replaced accesses that is a child of ACCESS, its sibling > > or a child of its sibling. TOP_OFFSET is the offset from the processed > > access subtree that has to be subtracted from offset of each access. */ > > > > static bool > > ref_expr_for_all_replacements_p (struct access *access, tree agg, > > HOST_WIDE_INT top_offset) > > { > > do > > { > > if (access->grp_to_be_replaced > > && !build_ref_for_offset (NULL, TREE_TYPE (agg), > > access->offset - top_offset, > > access->type, false)) > > return false; > > > > if (access->first_child > > && !ref_expr_for_all_replacements_p (access->first_child, agg, > > top_offset)) > > return false; > > > > access = access->next_sibling; > > } > > while (access); > > > > return true; > > } > > > > > > /* Generate statements copying scalar replacements of accesses within a subtree > > into or out of AGG. ACCESS is the first child of the root of the subtree to > > be processed. AGG is an aggregate type expression (can be a declaration but > > does not have to be, it can for example also be an indirect_ref). > > TOP_OFFSET is the offset of the processed subtree which has to be subtracted > > from offsets of individual accesses to get corresponding offsets for AGG. > > If CHUNK_SIZE is non-null, copy only replacements in the interval > > <start_offset, start_offset + chunk_size>, otherwise copy all. GSI is a > > statement iterator used to place the new statements. WRITE should be true > > when the statements should write from AGG to the replacement and false if > > vice versa. if INSERT_AFTER is true, new statements will be added after the > > current statement in GSI, they will be added before the statement > > otherwise. */ > > > > static void > > generate_subtree_copies (struct access *access, tree agg, > > HOST_WIDE_INT top_offset, > > HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size, > > gimple_stmt_iterator *gsi, bool write, > > bool insert_after) > > { > > do > > { > > tree expr = unshare_expr (agg); > > > > if (chunk_size && access->offset >= start_offset + chunk_size) > > return; > > > > if (access->grp_to_be_replaced > > && (chunk_size == 0 > > || access->offset + access->size > start_offset)) > > { > > bool repl_found; > > gimple stmt; > > > > repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg), > > access->offset - top_offset, > > access->type, false); > > gcc_assert (repl_found); > > > > if (write) > > stmt = gimple_build_assign (get_access_replacement (access), expr); > > else > > { > > tree repl = get_access_replacement (access); > > TREE_NO_WARNING (repl) = 1; > > stmt = gimple_build_assign (expr, repl); > > } > > > > if (insert_after) > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > else > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > update_stmt (stmt); > > } > > > > if (access->first_child) > > generate_subtree_copies (access->first_child, agg, top_offset, > > start_offset, chunk_size, gsi, > > write, insert_after); > > > > access = access->next_sibling; > > } > > while (access); > > } > > > > /* Assign zero to all scalar replacements in an access subtree. ACCESS is the > > the root of the subtree to be processed. GSI is the statement iterator used > > for inserting statements which are added after the current statement if > > INSERT_AFTER is true or before it otherwise. */ > > > > static void > > init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi, > > bool insert_after) > > > > { > > struct access *child; > > > > if (access->grp_to_be_replaced) > > { > > gimple stmt; > > > > stmt = gimple_build_assign (get_access_replacement (access), > > fold_convert (access->type, > > integer_zero_node)); > > if (insert_after) > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > else > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > update_stmt (stmt); > > } > > > > for (child = access->first_child; child; child = child->next_sibling) > > init_subtree_with_zero (child, gsi, insert_after); > > } > > > > /* Search for an access representative for the given expression EXPR and > > return it or NULL if it cannot be found. */ > > > > static struct access * > > get_access_for_expr (tree expr) > > { > > HOST_WIDE_INT offset, size, max_size; > > tree base; > > > > if (TREE_CODE (expr) == NOP_EXPR > > CONVERT_EXPR_P (expr) > > > || TREE_CODE (expr) == VIEW_CONVERT_EXPR) > > VIEW_CONVERT_EXPR is also a handled_component_p. > > Note that NOP_EXPR should never occur here - that would be invalid > gimple. So I think you can (and should) just delete the above. I haven't seen a NOP_EXPR for a while, do they still exist in lower gimple? Thus I have removed their handling. Removing diving through V_C_E breaks ADA, though. The reason is that we get a different size (and max_size) when calling get_ref_base_and_extent on the V_C_E and on its argument. However, I believe both should be represented by a single access representative. > > expr = TREE_OPERAND (expr, 0); > > > > if (handled_component_p (expr)) > > { > > base = get_ref_base_and_extent (expr, &offset, &size, &max_size); > > size = max_size; > > if (size == -1 || !base || !DECL_P (base)) > > return NULL; > > } > > else if (DECL_P (expr)) > > { > > tree tree_size; > > > > base = expr; > > tree_size = TYPE_SIZE (TREE_TYPE (base)); > > if (tree_size && host_integerp (tree_size, 1)) > > size = max_size = tree_low_cst (tree_size, 1); > > else > > return NULL; > > > > offset = 0; > > See above. get_ref_base_and_extent handles plain DECLs just fine. OK > > } > > else > > return NULL; > > > > if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base))) > > return NULL; > > > > return get_var_base_offset_size_access (base, offset, size); > > } > > > > /* Substitute into *EXPR an expression of type TYPE with the value of the > > replacement of ACCESS. This is done either by producing a special V_C_E > > assignment statement converting the replacement to a new temporary of the > > requested type if TYPE is not TREE_ADDRESSABLE or by going through the base > > aggregate if it is. */ > > > > static void > > fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, > > gimple_stmt_iterator *gsi, bool write) > > { > > tree repl = get_access_replacement (access); > > if (!TREE_ADDRESSABLE (type)) > > { > > tree tmp = create_tmp_var (type, "SRvce"); > > > > add_referenced_var (tmp); > > if (is_gimple_reg_type (type)) > > tmp = make_ssa_name (tmp, NULL); > > Should be always is_gimple_reg_type () if it is a type suitable for > a SRA scalar replacement. No, it is the type suitable for the statement, it can be a union type or a record with only one field. But see the more thorough explanation below... > But you should set DECL_GIMPLE_REG_P for > VECTOR and COMPLEX types here. > > > if (write) > > { > > gimple stmt; > > tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); > > This needs to either always fold to plain 'tmp' or tmp has to be a > non-register. Otherwise you will create invalid gimple. > > > *expr = tmp; > > if (is_gimple_reg_type (type)) > > SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); > > See above. > > > stmt = gimple_build_assign (repl, conv); > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > update_stmt (stmt); > > } > > else > > { > > gimple stmt; > > tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); > > > > stmt = gimple_build_assign (tmp, conv); > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > if (is_gimple_reg_type (type)) > > SSA_NAME_DEF_STMT (tmp) = stmt; > > See above. (I wonder if the patch still passes bootstrap & regtest > after the typecking patch) > > > *expr = tmp; > > update_stmt (stmt); > > } > > } > > else > > { > > if (write) > > { > > gimple stmt; > > > > stmt = gimple_build_assign (repl, unshare_expr (access->expr)); > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > update_stmt (stmt); > > } > > else > > { > > gimple stmt; > > > > stmt = gimple_build_assign (unshare_expr (access->expr), repl); > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > update_stmt (stmt); > > } > > I don't understand this path. Are the types here always compatible? And I don't really understand the comments. The function is called by sra_modify_expr (the function doing the replacements in all non-assign statements) when it needs to replace a reference by a scalar but the types don't match. This can happen when replacing a V_C_E, a union access when we picked a different type that the one used in the statement or (and this case can be remarkably irritating) an access to a records with only one (scalar) field. My original idea was to simply put a V_C_E in the place. However, I believe there are places where this is not possible - or at least one case, a LHS of a call statement because V_C_Es of gimple_registers (ssa_names) are not allowed on LHSs. My initial idea to handle these cases were to create a new temporary with a matching and a V_C_E assign statement (with the V_C_E always on the RHS - I believe that works even with gimple registers) that would do the conversion and load/store it to the replacement variable (this is what the !TREE_ADDRESSABLE branch does). The problem with this idea are TREE_ADDRESSABLE types. These types need to be constructed and thus we cannot create temporary variables of these types. On the other hand they absolutely need to be SRAed, not doing so slows down tramp3d by a factor of two (and the current SRA also breaks them up). And quite a few C++ classes are such types that are "non-addressable" and have only one scalar field. Identifying such records is possible, I soon realized that I can simply leave the statement as it is and produce a new statement to do load/store from the original field (that's what the outermost else branch does). Does this make sense or is there some fundamental flaw in my reasoning about gimple again? Does this explain what the function does? It certainly passes bootstrap and testing, I use --enable-checking=yes. > > } > > } > > > > > > /* Callback for scan_function. Replace the expression EXPR with a scalar > > replacement if there is one and generate other statements to do type > > conversion or subtree copying if necessary. GSI is used to place newly > > created statements, WRITE is true if the expression is being written to (it > > is on a LHS of a statement or output in an assembly statement). */ > > > > static bool > > sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write, > > void *data ATTRIBUTE_UNUSED) > > { > > struct access *access; > > tree type, bfr; > > > > if (TREE_CODE (*expr) == BIT_FIELD_REF) > > { > > bfr = *expr; > > expr = &TREE_OPERAND (*expr, 0); > > } > > else > > bfr = NULL_TREE; > > > > if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR) > > expr = &TREE_OPERAND (*expr, 0); > > Why strip these early? I think this is wrong (or do you always want to > produce complex type replacements, even if only the real or imaginary part > are used? If so a strathegic comment somewhere is missing.) As I have already written above, my intention was to keep the complex replacements complex. Again, not doing so would definitely simplify the code so if you think there is no benefit in doing so I ma get rid of all their special handling. I also treat vector accesses through BIT_FIELD_REFs specially... ATM, I'll add a comment to the access structure description. > > type = TREE_TYPE (*expr); > > > > access = get_access_for_expr (*expr); > > if (!access) > > return false; > > > > if (access->grp_to_be_replaced) > > { > > if (!useless_type_conversion_p (type, access->type)) > > fix_incompatible_types_for_expr (expr, type, access, gsi, write); > > else > > *expr = get_access_replacement (access); > > } > > > > if (access->first_child) > > { > > HOST_WIDE_INT start_offset, chunk_size; > > if (bfr > > && host_integerp (TREE_OPERAND (bfr, 1), 1) > > && host_integerp (TREE_OPERAND (bfr, 2), 1)) > > { > > start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1); > > chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1); > > } > > else > > start_offset = chunk_size = 0; > > > > generate_subtree_copies (access->first_child, access->base, 0, > > start_offset, chunk_size, gsi, write, write); > > } > > return true; > > } > > > > /* Store all replacements in the access tree rooted in TOP_RACC either to their > > base aggregate if there are unscalarized data or directly to LHS > > otherwise. */ > > > > static void > > handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs, > > gimple_stmt_iterator *gsi) > > { > > if (top_racc->grp_unscalarized_data) > > generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0, > > gsi, false, false); > > else > > generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset, > > 0, 0, gsi, false, false); > > } > > > > > > /* Try to generate statements to load all sub-replacements in an access > > (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC > > (sub)tree. If that is not possible, refresh the TOP_RACC base aggregate and > > load the accesses from it. LEFT_OFFSET is the offset of the left whole > > subtree being copied, RIGHT_OFFSET is the same thing for the right subtree. > > GSI is stmt iterator used for statement insertions. *REFRESHED is true iff > > the rhs top aggregate has already been refreshed by contents of its scalar > > reductions and is set to true if this function has to do it. */ > > > > static void > > load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc, > > HOST_WIDE_INT left_offset, > > HOST_WIDE_INT right_offset, > > gimple_stmt_iterator *old_gsi, > > gimple_stmt_iterator *new_gsi, > > bool *refreshed, tree lhs) > > { > > do > > { > > if (lacc->grp_to_be_replaced) > > { > > struct access *racc; > > HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset; > > > > racc = find_access_in_subtree (top_racc, offset, lacc->size); > > if (racc && racc->grp_to_be_replaced) > > { > > gimple stmt; > > > > if (useless_type_conversion_p (lacc->type, racc->type)) > > stmt = gimple_build_assign (get_access_replacement (lacc), > > get_access_replacement (racc)); > > else > > { > > tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type, > > get_access_replacement (racc)); > > stmt = gimple_build_assign (get_access_replacement (lacc), > > rhs); > > } > > > > gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); > > update_stmt (stmt); > > } > > else > > { > > tree expr = unshare_expr (top_racc->base); > > bool repl_found; > > gimple stmt; > > > > /* No suitable access on the right hand side, need to load from > > the aggregate. See if we have to update it first... */ > > if (!*refreshed) > > { > > gcc_assert (top_racc->first_child); > > handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); > > *refreshed = true; > > } > > > > repl_found = build_ref_for_offset (&expr, > > TREE_TYPE (top_racc->base), > > lacc->offset - left_offset, > > lacc->type, false); > > gcc_assert (repl_found); > > stmt = gimple_build_assign (get_access_replacement (lacc), > > expr); > > gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); > > update_stmt (stmt); > > } > > } > > else if (lacc->grp_read && !lacc->grp_covered && !*refreshed) > > { > > handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); > > *refreshed = true; > > } > > > > if (lacc->first_child) > > load_assign_lhs_subreplacements (lacc->first_child, top_racc, > > left_offset, right_offset, > > old_gsi, new_gsi, refreshed, lhs); > > lacc = lacc->next_sibling; > > } > > while (lacc); > > } > > > > /* Return true iff ACC is non-NULL and has subaccesses. */ > > > > static inline bool > > access_has_children_p (struct access *acc) > > { > > return acc && acc->first_child; > > } > > > > /* Modify assignments with a CONSTRUCTOR on their RHS. STMT contains a pointer > > to the assignment and GSI is the statement iterator pointing at it. Returns > > the same values as sra_modify_assign. */ > > > > static enum scan_assign_result > > sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi) > > { > > tree lhs = gimple_assign_lhs (*stmt); > > struct access *acc; > > > > gcc_assert (TREE_CODE (lhs) != REALPART_EXPR > > && TREE_CODE (lhs) != IMAGPART_EXPR); > > acc = get_access_for_expr (lhs); > > if (!acc) > > return SRA_SA_NONE; > > > > if (VEC_length (constructor_elt, > > CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0) > > { > > /* I have never seen this code path trigger but if it can happen the > > following should handle it gracefully. */ > > It can trigger for vector constants. OK, I'll remove the comment. Apparently there are none in the testsuite, I believe I tested with a gcc_unreachable here. > > if (access_has_children_p (acc)) > > generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi, > > true, true); > > return SRA_SA_PROCESSED; > > } > > > > if (acc->grp_covered) > > { > > init_subtree_with_zero (acc, gsi, false); > > unlink_stmt_vdef (*stmt); > > gsi_remove (gsi, true); > > return SRA_SA_REMOVED; > > } > > else > > { > > init_subtree_with_zero (acc, gsi, true); > > return SRA_SA_PROCESSED; > > } > > } > > > > > > /* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with > > to-be-scalarized expressions with them. STMT is the statement and GSI is > > the iterator used to place new helper statements. Returns the same values > > as sra_modify_assign. */ > > > > static enum scan_assign_result > > sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi) > > { > > tree lhs, complex, ptype, rp, ip; > > struct access *access; > > gimple new_stmt, aux_stmt; > > > > lhs = gimple_assign_lhs (stmt); > > complex = TREE_OPERAND (lhs, 0); > > > > access = get_access_for_expr (complex); > > > > if (!access || !access->grp_to_be_replaced) > > return SRA_SA_NONE; > > > > ptype = TREE_TYPE (TREE_TYPE (complex)); > > rp = create_tmp_var (ptype, "SRr"); > > add_referenced_var (rp); > > rp = make_ssa_name (rp, NULL); > > > > ip = create_tmp_var (ptype, "SRp"); > > add_referenced_var (ip); > > ip = make_ssa_name (ip, NULL); > > > > if (TREE_CODE (lhs) == IMAGPART_EXPR) > > { > > aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype, > > get_access_replacement (access))); > > SSA_NAME_DEF_STMT (rp) = aux_stmt; > > gimple_assign_set_lhs (stmt, ip); > > SSA_NAME_DEF_STMT (ip) = stmt; > > } > > else > > { > > aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype, > > get_access_replacement (access))); > > SSA_NAME_DEF_STMT (ip) = aux_stmt; > > gimple_assign_set_lhs (stmt, rp); > > SSA_NAME_DEF_STMT (rp) = stmt; > > } > > > > gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT); > > update_stmt (aux_stmt); > > new_stmt = gimple_build_assign (get_access_replacement (access), > > fold_build2 (COMPLEX_EXPR, access->type, > > rp, ip)); > > gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); > > update_stmt (new_stmt); > > Hm. So you do what complex lowering does here. Note that this may > create loads from uninitialized memory with all its problems. Yes, but I have not had any such problems with complex types (as opposed to simple loads from half-initialized records, for example). OTOH, I have also contemplated setting DECL_GIMPLE_REG_P to zero of complex replacement which appear in IMAG_PART or REAL_PART on a LHS of a statement. > > WRT the complex stuff. If you would do scalarization and analysis > just on the components (not special case REAL/IMAGPART_EXPR everywhere) > it should work better, correct? You still could handle group > scalarization for the case of for example passing a complex argument > to a function. Well, my reasoning was that if complex types were first-class citizens in gimple (as opposed to a record), there was a reason to keep them together and so I attempted that. But again, if that is a misconception of mine and there is no point in keeping them together, I will gladly remove this. > void bar(_Complex float); > void foo(float x, float y) > { > _Complex float z = x; > __imag z = y; > bar(z); > } > > The same applies for vectors - the REAL/IMAGPART_EXPRs equivalent > there is BIT_FIELD_REF. These are handled by setting DECL_GIMPLE_REG_P to zero if a B_F_R is on a LHS. I believe the current SRA does the same. It works fine and there's a lot less fuss about them. > > return SRA_SA_PROCESSED; > > } > > > > /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components. */ > > > > static bool > > contains_view_convert_expr_p (tree t) > > { > > while (1) > > { > > if (TREE_CODE (t) == VIEW_CONVERT_EXPR) > > return true; > > if (!handled_component_p (t)) > > return false; > > t = TREE_OPERAND (t, 0); > > } > > } > > Place this in tree-flow-inline.h next to ref_contains_array_ref, also > structure the loop in the same way. OK, but I'd like the function to work if passed declarations too. Thus I cannot really use a do-while loop. I'll send it in a separate patch. > > /* Change STMT to assign compatible types by means of adding component or array > > references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as > > variable with the same names in sra_modify_assign. This is done in a > > such a complicated way in order to make > > testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some > > cases. */ > > > > static void > > fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt, > > struct access *lacc, struct access *racc, > > tree lhs, tree *rhs, tree ltype, tree rtype) > > { > > if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) > > && !access_has_children_p (lacc)) > > { > > tree expr = unshare_expr (lhs); > > bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, > > false); > > if (found) > > { > > gimple_assign_set_lhs (*stmt, expr); > > return; > > } > > } > > > > if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) > > && !access_has_children_p (racc)) > > { > > tree expr = unshare_expr (*rhs); > > bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, > > false); > > if (found) > > { > > gimple_assign_set_rhs1 (*stmt, expr); > > return; > > } > > } > > > > *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); > > gimple_assign_set_rhs_from_tree (gsi, *rhs); > > *stmt = gsi_stmt (*gsi); > > Reading this I have a deja-vu - isn't there another function in this > file doing the same thing? You are doing much unsharing even though > you re-build the access tree from scratch? This function has a similar purpose as fix_incompatible_types_for_expr but this time only for assign statements. That is easier because we can always put the V_C_E on the RHS and be safe and so no additional statements need to be generated. However, the V_C_Es rather than COMPONENT_REFs and ARRAY_REFs feel unnatural for accessing fields from single field records and unions and single element arrays. According to the comment I used to have problems of some sort with that in the ssa-sra-2.C testcase but I can no longer reproduce them (and don't remember them). I call unshare_expr in this context only when one side of an assignment statement is a scalar replacement and the other one is an aggregate (but not necessarily a declaration) which can happen only in the cases listed above. That is not very many calls and chances are good that build_ref_for_offset succeeds. Does that explain what is going on here? > > } > > > > /* Callback of scan_function to process assign statements. It examines both > > sides of the statement, replaces them with a scalare replacement if there is > > one and generating copying of replacements if scalarized aggregates have been > > used in the assignment. STMT is a pointer to the assign statement, GSI is > > used to hold generated statements for type conversions and subtree > > copying. */ > > > > static enum scan_assign_result > > sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, > > void *data ATTRIBUTE_UNUSED) > > { > > struct access *lacc, *racc; > > tree ltype, rtype; > > tree lhs, rhs; > > bool modify_this_stmt; > > > > if (gimple_assign_rhs2 (*stmt)) > > !gimple_assign_single_p (*stmt) > > (the only gimple assign that may access memory) OK > > > return SRA_SA_NONE; > > lhs = gimple_assign_lhs (*stmt); > > rhs = gimple_assign_rhs1 (*stmt); > > > > if (TREE_CODE (rhs) == CONSTRUCTOR) > > return sra_modify_constructor_assign (stmt, gsi); > > > > if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) > > return sra_modify_partially_complex_lhs (*stmt, gsi); > > > > if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR > > || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) > > { > > modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), > > gsi, false, data); > > modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), > > gsi, true, data); > > return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; > > } > > > > lacc = get_access_for_expr (lhs); > > racc = get_access_for_expr (rhs); > > if (!lacc && !racc) > > return SRA_SA_NONE; > > > > modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) > > || (racc && racc->grp_to_be_replaced)); > > > > if (lacc && lacc->grp_to_be_replaced) > > { > > lhs = get_access_replacement (lacc); > > gimple_assign_set_lhs (*stmt, lhs); > > ltype = lacc->type; > > } > > else > > ltype = TREE_TYPE (lhs); > > > > if (racc && racc->grp_to_be_replaced) > > { > > rhs = get_access_replacement (racc); > > gimple_assign_set_rhs1 (*stmt, rhs); > > rtype = racc->type; > > } > > else > > rtype = TREE_TYPE (rhs); > > > > /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate > > the statement makes the position of this pop_stmt_changes() a bit awkward > > but hopefully make some sense. */ > > I don't see pop_stmt_changes(). Yeah, the comment is outdated. I've removed it. > > if (modify_this_stmt) > > { > > if (!useless_type_conversion_p (ltype, rtype)) > > fix_modified_assign_compatibility (gsi, stmt, lacc, racc, > > lhs, &rhs, ltype, rtype); > > } > > > > if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) > > || (access_has_children_p (racc) > > && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) > > || (access_has_children_p (lacc) > > && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) > > ? A comment is missing what this case is about ... > > (this smells like fixup that could be avoided by doing things correct > in the first place) From this point on, the function deals with assignments in between aggregates when at least one has scalar reductions of some of its components. There are three possible scenarios: Both the LHS and RHS have to-be-scalarized components, 2) only the RHS has or 3) only the LHS has. In the first case, we would like to load the LHS components from RHS components whenever possible. If that is not possible, we would like to read it directly from the RHS (after updating it by storing in it its own components). If there are some necessary unscalarized data in the LHS, those will be loaded by the original assignment too. If neither of these cases happen, the original statement can be removed. Most of this is done by load_assign_lhs_subreplacements. In the second case, we would like to store all RHS scalarized components directly into LHS and if they cover the aggregate completely, remove the statement too. In the third case, we want the LHS components to be loaded directly from the RHS (DSE will remove the original statement if it becomes redundant). This is a bit complex but manageable when types match and when unions do not cause confusion in a way that we cannot really load a component of LHS from the RHS or vice versa (the access representing this level can have subaccesses that are accessible only through a different union field at a higher level - different from the one used in the examined expression). Unions are fun. Therefore, I specially handle a fourth case, happening when there is a specific type cast or it is impossible to locate a scalarized subaccess on the other side of the expression. If that happens, I simply "refresh" the RHS by storing in it is scalarized components leave the original statement there to do the copying and then load the scalar replacements of the LHS. This is what the first branch does. Is it clearer now? Perhaps I should put these five paragraphs as a comment into the function? > > { > > if (access_has_children_p (racc)) > > generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0, > > gsi, false, false); > > if (access_has_children_p (lacc)) > > generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0, > > gsi, true, true); > > } > > else > > { > > if (access_has_children_p (lacc) && access_has_children_p (racc)) > > { > > gimple_stmt_iterator orig_gsi = *gsi; > > bool refreshed; > > > > if (lacc->grp_read && !lacc->grp_covered) > > { > > handle_unscalarized_data_in_subtree (racc, lhs, gsi); > > refreshed = true; > > } > > else > > refreshed = false; > > > > load_assign_lhs_subreplacements (lacc->first_child, racc, > > lacc->offset, racc->offset, > > &orig_gsi, gsi, &refreshed, lhs); > > if (!refreshed || !racc->grp_unscalarized_data) > > { > > if (*stmt == gsi_stmt (*gsi)) > > gsi_next (gsi); > > > > unlink_stmt_vdef (*stmt); > > gsi_remove (&orig_gsi, true); > > return SRA_SA_REMOVED; > > } > > } > > else > > { > > if (access_has_children_p (racc)) > > { > > if (!racc->grp_unscalarized_data) > > { > > generate_subtree_copies (racc->first_child, lhs, > > racc->offset, 0, 0, gsi, > > false, false); > > gcc_assert (*stmt == gsi_stmt (*gsi)); > > unlink_stmt_vdef (*stmt); > > gsi_remove (gsi, true); > > return SRA_SA_REMOVED; > > } > > else > > generate_subtree_copies (racc->first_child, lhs, > > racc->offset, 0, 0, gsi, false, true); > > } > > else if (access_has_children_p (lacc)) > > generate_subtree_copies (lacc->first_child, rhs, lacc->offset, > > 0, 0, gsi, true, true); > > } > > } > > > > return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; > > } > > > > /* Generate statements initializing scalar replacements of parts of function > > parameters. */ > > > > static void > > initialize_parameter_reductions (void) > > { > > gimple_stmt_iterator gsi; > > gimple_seq seq = NULL; > > tree parm; > > > > for (parm = DECL_ARGUMENTS (current_function_decl); > > parm; > > parm = TREE_CHAIN (parm)) > > { > > VEC (access_p, heap) *access_vec; > > struct access *access; > > > > if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm))) > > continue; > > access_vec = get_base_access_vector (parm); > > if (!access_vec) > > continue; > > > > if (!seq) > > { > > seq = gimple_seq_alloc (); > > gsi = gsi_start (seq); > > } > > > > for (access = VEC_index (access_p, access_vec, 0); > > access; > > access = access->next_grp) > > generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true); > > } > > > > if (seq) > > gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq); > > } > > > > /* The "main" function of intraprocedural SRA passes. Runs the analysis and if > > it reveals there are components of some aggregates to be scalarized, it runs > > the required transformations. */ > > static unsigned int > > perform_intra_sra (void) > > { > > int ret = 0; > > sra_initialize (); > > > > if (!find_var_candidates ()) > > goto out; > > > > if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL, > > true, NULL)) > > goto out; > > > > if (!analyze_all_variable_accesses ()) > > goto out; > > > > scan_function (sra_modify_expr, sra_modify_assign, NULL, > > false, NULL); > > initialize_parameter_reductions (); > > > > ret = TODO_update_ssa; > > redundant set. > > > if (sra_mode == SRA_MODE_EARLY_INTRA) > > ret = TODO_update_ssa; > > else > > ret = TODO_update_ssa | TODO_rebuild_alias; > > in fact you shouldn't (need to) rebuild alias. OK > > out: > > sra_deinitialize (); > > return ret; > > } > > > > /* Perform early intraprocedural SRA. */ > > static unsigned int > > early_intra_sra (void) > > { > > sra_mode = SRA_MODE_EARLY_INTRA; > > return perform_intra_sra (); > > } > > > > /* Perform "late" intraprocedural SRA. */ > > static unsigned int > > late_intra_sra (void) > > { > > sra_mode = SRA_MODE_INTRA; > > return perform_intra_sra (); > > } > > > > > > static bool > > gate_intra_sra (void) > > { > > return flag_tree_sra != 0; > > } > > > > > > struct gimple_opt_pass pass_sra_early = > > { > > { > > GIMPLE_PASS, > > "esra", /* name */ > > gate_intra_sra, /* gate */ > > early_intra_sra, /* execute */ > > NULL, /* sub */ > > NULL, /* next */ > > 0, /* static_pass_number */ > > TV_TREE_SRA, /* tv_id */ > > PROP_cfg | PROP_ssa, /* properties_required */ > > 0, /* properties_provided */ > > 0, /* properties_destroyed */ > > 0, /* todo_flags_start */ > > TODO_dump_func > > | TODO_update_ssa > > | TODO_ggc_collect > > | TODO_verify_ssa /* todo_flags_finish */ > > } > > }; > > > > > > struct gimple_opt_pass pass_sra = > > { > > { > > GIMPLE_PASS, > > "sra", /* name */ > > gate_intra_sra, /* gate */ > > late_intra_sra, /* execute */ > > NULL, /* sub */ > > NULL, /* next */ > > 0, /* static_pass_number */ > > TV_TREE_SRA, /* tv_id */ > > PROP_cfg | PROP_ssa, /* properties_required */ > > 0, /* properties_provided */ > > 0, /* properties_destroyed */ > > TODO_update_address_taken, /* todo_flags_start */ > > TODO_dump_func > > | TODO_update_ssa > > | TODO_ggc_collect > > | TODO_verify_ssa /* todo_flags_finish */ > > } > > }; > > > Overall it looks good - I'm still a little bit confused, but that's > likely because reading code from top to bottom doesn't make the > most sense in all cases ;) > > Looking forward to a second look on a revised version. > Thanks again, Martin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-05-10 10:33 ` Martin Jambor @ 2009-05-10 11:48 ` Richard Guenther 2009-05-12 0:24 ` Martin Jambor 0 siblings, 1 reply; 25+ messages in thread From: Richard Guenther @ 2009-05-10 11:48 UTC (permalink / raw) To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka On Sun, 10 May 2009, Martin Jambor wrote: > > > expr = TREE_OPERAND (expr, 0); > > > bit_ref = true; > > > } > > > else > > > bit_ref = false; > > > > > > while (TREE_CODE (expr) == NOP_EXPR > > > > CONVERT_EXPR_P (expr) > > OK... but at another place in the email you said it might not even > appear in a valid gimple statement? Should I remove it altogether? Indeed. If you not build trees from tuple stmts then a NOP_EXPR cannot appear as a rhs1 or rhs2 of an assignment (instead it is always the subcode in gimple stmt and the rhs1 is simply sth valid for is_gimple_val). > > > || TREE_CODE (expr) == VIEW_CONVERT_EXPR > > > || TREE_CODE (expr) == REALPART_EXPR > > > || TREE_CODE (expr) == IMAGPART_EXPR) > > > expr = TREE_OPERAND (expr, 0); > > > > Why do this here btw, and not just lump ... > > > > > switch (TREE_CODE (expr)) > > > { > > > case ADDR_EXPR: > > > case SSA_NAME: > > > case INDIRECT_REF: > > > break; > > > > > > case VAR_DECL: > > > case PARM_DECL: > > > case RESULT_DECL: > > > case COMPONENT_REF: > > > case ARRAY_REF: > > > ret = create_access (expr, write); > > > break; > > > > ... this ... > > > > > case REALPART_EXPR: > > > case IMAGPART_EXPR: > > > expr = TREE_OPERAND (expr, 0); > > > ret = create_access (expr, write); > > > > ... and this together? Won't you create bogus accesses if you > > strip for example IMAGPART_EXPR (which has non-zero offset)? > > That would break the complex number into its components. I thought > that they are meant to stay together for some reason, otherwise they > would not be represented explicitly in gimple... do you think it does > not matter? What about vectors then? > > The access is not bogus because modification functions take care of > these statements in a special way. However, if it is indeed OK to > split complex numbers into their components, I will gladly simplify > this as you suggested. Yes, it is valid to split them (and complex lowering indeed does that). It _might_ be useful to keep a complex together in a single SSA_NAME for optimization purposes, but I guess you detect that anyway if there is a read of the whole complex element into a register and keep it that way. I would favor simplifying SRA in this case and just split them if that is valid. > > > break; > > > > > > case ARRAY_RANGE_REF: > > > > it should just be handled fine I think. > > OK, I will try that at some later stage. > > > > default: > > > walk_tree (&safe_expr, disqualify_all, NULL, NULL); > > > > and if not, this should just disqualify the base of the access, like > > get_base_address (safe_expr) (save_expr you mean?) and then if that > > is a DECL, disqualify that decl. > > I'll test just doing nothing, things handled by get_base_address are > either fine or already accounted for. Fine with me. > > > return SRA_SA_NONE; > > > > > > lhs_ptr = gimple_assign_lhs_ptr (stmt); > > > rhs_ptr = gimple_assign_rhs1_ptr (stmt); > > > > you probably don't need to pass pointers to trees everywhere as you > > are not changing them. > > Well, this function is a callback called by scan_function which can > also call sra_modify_expr in the last stage of the pass when > statements are modified. I have considered splitting the function > into two but in the end I thought they would be too similar and the > overhead is hopefully manageable. Yeah, I noticed this later. It is somewhat confusing at first sight, so maybe just amending the comment before this function could clarify things. > > > if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr)) > > > return SRA_SA_NONE; > > > > > > racc = build_access_from_expr_1 (rhs_ptr, gsi, false); > > > lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); > > > > just avoid calling into build_access_from_expr_1 for SSA_NAMEs > > or is_gimple_min_invariant lhs/rhs, that should make that > > function more regular. > > In what sense? build_access_from_expr_1 looks at TREE_CODE anyway and > can discard the two cases, without for example looking into ADR_EXPRs > like is_gimple_min_invariant(). > > But if you really think it is indeed beneficial, I can do that, sure - > to me it just looks ugly). Ok, just keep it as is. > > > if (lacc && racc > > > && !lacc->grp_unscalarizable_region > > > && !racc->grp_unscalarizable_region > > > && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) > > > && lacc->size <= racc->size > > > && useless_type_conversion_p (lacc->type, racc->type)) > > > > useless_type_conversion_p should be always true here. > > I don't think so, build_access_from_expr_1 can look through V_C_Es and > the types of accesses are the type of the operand in such cases.. Ok, but what is the point of looking through V_C_Es there if it makes this test fail? Hmm, IIRC this was only to track struct copies, right? I guess it's ok then. > > That would just be useless information. I guess you copied this > > from old SRA? > > Yes. All this fancy naming stuff is quite useless but I find it very > handy when debugging SRA issues. Yeah, sort of. Still using no name in that case will do exactly the same thing ;) > > > static tree > > > create_access_replacement (struct access *access) > > > { > > > tree repl; > > > > > > repl = make_rename_temp (access->type, "SR"); > > > get_var_ann (repl); > > > add_referenced_var (repl); > > > > > > DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); > > > DECL_ARTIFICIAL (repl) = 1; > > > > > > if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) > > > > at least && !DECL_ARTIFICIAL (access->base) I think. > > This part is also largely copied from the old SRA. So far it seems to > work nicely, replacements of artificial declarations get SRsome_number > fancy names and that makes them easy to distinguish. Nevertheless, I > can change the condition if it is somehow wrong. Or do you expect any > other problems beside not-so-fancy fancy names? No, it merely uses up memory. Not that other passes do not do this ... Thus, I probably do not care too much. > > > if (access->grp_bfr_lhs) > > > DECL_GIMPLE_REG_P (repl) = 0; > > > > But you never set it (see update_address_taken for more cases, > > most notably VIEW_CONVERT_EXPR on the lhs which need to be taken > > care of). You should set it for COMPLEX_TYPE and VECTOR_TYPE > > replacements. > > This function is the only place where I still use make_rename_temp > which sets it exactly in these two cases. I did not really know why > it is required in these two cases and only in these two cases so I > left it there, at least for now. I guess I understand that now after > seeing update_address_taken. > > I can replace this with calling create_tmp_var() and doing all the > rest that make_rename_temp does - I believe that you intend to to > remove it - I have just not found out why it is so bad. The bad thing about it is that it supports using the SSA renamer to write a single variable into SSA. That is usually more costly than just manually allocating SSA_NAMEs and updating SSA form, which is usally very easy. It's not used much, in which case the easiest thing might be to fix all remaining uses to manually update SSA form. But yes, I now see why that zeroing is necessary. > > > > CONVERT_EXPR_P (expr) > > > > > || TREE_CODE (expr) == VIEW_CONVERT_EXPR) > > > > VIEW_CONVERT_EXPR is also a handled_component_p. > > > > Note that NOP_EXPR should never occur here - that would be invalid > > gimple. So I think you can (and should) just delete the above. > > I haven't seen a NOP_EXPR for a while, do they still exist in lower > gimple? Thus I have removed their handling. > > Removing diving through V_C_E breaks ADA, though. The reason is that > we get a different size (and max_size) when calling > get_ref_base_and_extent on the V_C_E and on its argument. However, I > believe both should be represented by a single access representative. Yeah, I remember this :/ It is technically invalid GIMPLE that the Ada FE generates though. The size of the V_C_E result has to match that of the operand. Please add a FIXME before this stripping refering to the Ada problem. > > > tree repl = get_access_replacement (access); > > > if (!TREE_ADDRESSABLE (type)) > > > { > > > tree tmp = create_tmp_var (type, "SRvce"); > > > > > > add_referenced_var (tmp); > > > if (is_gimple_reg_type (type)) > > > tmp = make_ssa_name (tmp, NULL); > > > > Should be always is_gimple_reg_type () if it is a type suitable for > > a SRA scalar replacement. > > No, it is the type suitable for the statement, it can be a union type > or a record with only one field. But see the more thorough explanation > below... I think it should be always a register type, but see below... ;) > > But you should set DECL_GIMPLE_REG_P for > > VECTOR and COMPLEX types here. > > > > > if (write) > > > { > > > gimple stmt; > > > tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); > > > > This needs to either always fold to plain 'tmp' or tmp has to be a > > non-register. Otherwise you will create invalid gimple. > > > > > *expr = tmp; > > > if (is_gimple_reg_type (type)) > > > SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); > > > > See above. > > > > > stmt = gimple_build_assign (repl, conv); > > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > > update_stmt (stmt); > > > } > > > else > > > { > > > gimple stmt; > > > tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); > > > > > > stmt = gimple_build_assign (tmp, conv); > > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > > if (is_gimple_reg_type (type)) > > > SSA_NAME_DEF_STMT (tmp) = stmt; > > > > See above. (I wonder if the patch still passes bootstrap & regtest > > after the typecking patch) > > > > > *expr = tmp; > > > update_stmt (stmt); > > > } > > > } > > > else > > > { > > > if (write) > > > { > > > gimple stmt; > > > > > > stmt = gimple_build_assign (repl, unshare_expr (access->expr)); > > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > > update_stmt (stmt); > > > } > > > else > > > { > > > gimple stmt; > > > > > > stmt = gimple_build_assign (unshare_expr (access->expr), repl); > > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > > update_stmt (stmt); > > > } > > > > I don't understand this path. Are the types here always compatible? > > And I don't really understand the comments. The function is called by > sra_modify_expr (the function doing the replacements in all non-assign > statements) when it needs to replace a reference by a scalar but the > types don't match. This can happen when replacing a V_C_E, a union > access when we picked a different type that the one used in the > statement or (and this case can be remarkably irritating) an access to > a records with only one (scalar) field. > > My original idea was to simply put a V_C_E in the place. However, I > believe there are places where this is not possible - or at least one > case, a LHS of a call statement because V_C_Es of gimple_registers > (ssa_names) are not allowed on LHSs. My initial idea to handle these > cases were to create a new temporary with a matching and a V_C_E > assign statement (with the V_C_E always on the RHS - I believe that > works even with gimple registers) that would do the conversion and > load/store it to the replacement variable (this is what the > !TREE_ADDRESSABLE branch does). > > The problem with this idea are TREE_ADDRESSABLE types. These types > need to be constructed and thus we cannot create temporary variables > of these types. On the other hand they absolutely need to be SRAed, > not doing so slows down tramp3d by a factor of two (and the current > SRA also breaks them up). And quite a few C++ classes are such types > that are "non-addressable" and have only one scalar field. > Identifying such records is possible, I soon realized that I can > simply leave the statement as it is and produce a new statement to do > load/store from the original field (that's what the outermost else > branch does). > > Does this make sense or is there some fundamental flaw in my reasoning > about gimple again? Does this explain what the function does? Ok, so the case in question is struct X { int i; } x; x = foo (); where you want to scalarize x. Indeed the obvious scalarization would be x = foo (); SR_1 = x.i; For all other LHS cases (not calls) you can move the V_C_E to the RHS and should be fine. I don't understand the TREE_ADDRESSABLE thingy yet. Especially why the type should be a non-register type. My understanding was that in reads the replacement LHS will be a register, for mismatched types you simply put a V_C_E around the RHS. For writes the replacement RHS will be a register, possibly surrounded by a V_C_E. The special case is calls where you cannot put a V_C_E around the "RHS". So, can you explain with an example, how it comes that the scalar replacement type is a non-register type? (Whatever our conclusion will be, the function should have a big comment enumerating the cases we have to deal with) > It certainly passes bootstrap and testing, I use --enable-checking=yes. That's good. > > > { > > > /* I have never seen this code path trigger but if it can happen the > > > following should handle it gracefully. */ > > > > It can trigger for vector constants. > > OK, I'll remove the comment. Apparently there are none in the > testsuite, I believe I tested with a gcc_unreachable here. Err, for vector constants we have VECTOR_CST, so it triggers for non-constant vector constructors like vector int x = { a, b, c, d }; > > > update_stmt (aux_stmt); > > > new_stmt = gimple_build_assign (get_access_replacement (access), > > > fold_build2 (COMPLEX_EXPR, access->type, > > > rp, ip)); > > > gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); > > > update_stmt (new_stmt); > > > > Hm. So you do what complex lowering does here. Note that this may > > create loads from uninitialized memory with all its problems. > > Yes, but I have not had any such problems with complex types (as > opposed to simple loads from half-initialized records, for example). > OTOH, I have also contemplated setting DECL_GIMPLE_REG_P to zero of > complex replacement which appear in IMAG_PART or REAL_PART on a LHS of > a statement. Yes, that's necessary. I still think SRA should not bother about this at all ;) > > WRT the complex stuff. If you would do scalarization and analysis > > just on the components (not special case REAL/IMAGPART_EXPR everywhere) > > it should work better, correct? You still could handle group > > scalarization for the case of for example passing a complex argument > > to a function. > > Well, my reasoning was that if complex types were first-class citizens > in gimple (as opposed to a record), there was a reason to keep them > together and so I attempted that. But again, if that is a > misconception of mine and there is no point in keeping them together, > I will gladly remove this. It's not clear. Complex lowering decomposes all complex variables to components if possible. Again, simplifying SRA is probably better. > > void bar(_Complex float); > > void foo(float x, float y) > > { > > _Complex float z = x; > > __imag z = y; > > bar(z); > > } > > > > The same applies for vectors - the REAL/IMAGPART_EXPRs equivalent > > there is BIT_FIELD_REF. > > These are handled by setting DECL_GIMPLE_REG_P to zero if a B_F_R is > on a LHS. I believe the current SRA does the same. It works fine and > there's a lot less fuss about them. > > > > return SRA_SA_PROCESSED; > > > } > > > > > > /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components. */ > > > > > > static bool > > > contains_view_convert_expr_p (tree t) > > > { > > > while (1) > > > { > > > if (TREE_CODE (t) == VIEW_CONVERT_EXPR) > > > return true; > > > if (!handled_component_p (t)) > > > return false; > > > t = TREE_OPERAND (t, 0); > > > } > > > } > > > > Place this in tree-flow-inline.h next to ref_contains_array_ref, also > > structure the loop in the same way. > > OK, but I'd like the function to work if passed declarations too. > Thus I cannot really use a do-while loop. I'll send it in a separate > patch. > > > > /* Change STMT to assign compatible types by means of adding component or array > > > references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as > > > variable with the same names in sra_modify_assign. This is done in a > > > such a complicated way in order to make > > > testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some > > > cases. */ > > > > > > static void > > > fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt, > > > struct access *lacc, struct access *racc, > > > tree lhs, tree *rhs, tree ltype, tree rtype) > > > { > > > if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) > > > && !access_has_children_p (lacc)) > > > { > > > tree expr = unshare_expr (lhs); > > > bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, > > > false); > > > if (found) > > > { > > > gimple_assign_set_lhs (*stmt, expr); > > > return; > > > } > > > } > > > > > > if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) > > > && !access_has_children_p (racc)) > > > { > > > tree expr = unshare_expr (*rhs); > > > bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, > > > false); > > > if (found) > > > { > > > gimple_assign_set_rhs1 (*stmt, expr); > > > return; > > > } > > > } > > > > > > *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); > > > gimple_assign_set_rhs_from_tree (gsi, *rhs); > > > *stmt = gsi_stmt (*gsi); > > > > Reading this I have a deja-vu - isn't there another function in this > > file doing the same thing? You are doing much unsharing even though > > you re-build the access tree from scratch? > > This function has a similar purpose as fix_incompatible_types_for_expr > but this time only for assign statements. That is easier because we > can always put the V_C_E on the RHS and be safe and so no additional > statements need to be generated. > > However, the V_C_Es rather than COMPONENT_REFs and ARRAY_REFs feel > unnatural for accessing fields from single field records and unions > and single element arrays. According to the comment I used to have > problems of some sort with that in the ssa-sra-2.C testcase but I can > no longer reproduce them (and don't remember them). > > I call unshare_expr in this context only when one side of an > assignment statement is a scalar replacement and the other one is an > aggregate (but not necessarily a declaration) which can happen only in > the cases listed above. That is not very many calls and chances are > good that build_ref_for_offset succeeds. > > Does that explain what is going on here? Yes. I guess merging the two functions would make sense to me (or putting them next to each other at least). The function signatures also look seemingly weird (that you store into *rhs but still set the stmt rhs, etc.). I have another look here with the new version. > > > } > > > > > > /* Callback of scan_function to process assign statements. It examines both > > > sides of the statement, replaces them with a scalare replacement if there is > > > one and generating copying of replacements if scalarized aggregates have been > > > used in the assignment. STMT is a pointer to the assign statement, GSI is > > > used to hold generated statements for type conversions and subtree > > > copying. */ > > > > > > static enum scan_assign_result > > > sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, > > > void *data ATTRIBUTE_UNUSED) > > > { > > > struct access *lacc, *racc; > > > tree ltype, rtype; > > > tree lhs, rhs; > > > bool modify_this_stmt; > > > > > > if (gimple_assign_rhs2 (*stmt)) > > > > !gimple_assign_single_p (*stmt) > > > > (the only gimple assign that may access memory) > > OK > > > > > > return SRA_SA_NONE; > > > lhs = gimple_assign_lhs (*stmt); > > > rhs = gimple_assign_rhs1 (*stmt); > > > > > > if (TREE_CODE (rhs) == CONSTRUCTOR) > > > return sra_modify_constructor_assign (stmt, gsi); > > > > > > if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) > > > return sra_modify_partially_complex_lhs (*stmt, gsi); > > > > > > if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR > > > || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) > > > { > > > modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), > > > gsi, false, data); > > > modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), > > > gsi, true, data); > > > return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; > > > } > > > > > > lacc = get_access_for_expr (lhs); > > > racc = get_access_for_expr (rhs); > > > if (!lacc && !racc) > > > return SRA_SA_NONE; > > > > > > modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) > > > || (racc && racc->grp_to_be_replaced)); > > > > > > if (lacc && lacc->grp_to_be_replaced) > > > { > > > lhs = get_access_replacement (lacc); > > > gimple_assign_set_lhs (*stmt, lhs); > > > ltype = lacc->type; > > > } > > > else > > > ltype = TREE_TYPE (lhs); > > > > > > if (racc && racc->grp_to_be_replaced) > > > { > > > rhs = get_access_replacement (racc); > > > gimple_assign_set_rhs1 (*stmt, rhs); > > > rtype = racc->type; > > > } > > > else > > > rtype = TREE_TYPE (rhs); > > > > > > /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate > > > the statement makes the position of this pop_stmt_changes() a bit awkward > > > but hopefully make some sense. */ > > > > I don't see pop_stmt_changes(). > > Yeah, the comment is outdated. I've removed it. > > > > if (modify_this_stmt) > > > { > > > if (!useless_type_conversion_p (ltype, rtype)) > > > fix_modified_assign_compatibility (gsi, stmt, lacc, racc, > > > lhs, &rhs, ltype, rtype); > > > } > > > > > > if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) > > > || (access_has_children_p (racc) > > > && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) > > > || (access_has_children_p (lacc) > > > && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) > > > > ? A comment is missing what this case is about ... > > > > (this smells like fixup that could be avoided by doing things correct > > in the first place) > > From this point on, the function deals with assignments in between > aggregates when at least one has scalar reductions of some of its > components. There are three possible scenarios: Both the LHS and RHS > have to-be-scalarized components, 2) only the RHS has or 3) only the > LHS has. > > In the first case, we would like to load the LHS components from RHS > components whenever possible. If that is not possible, we would like > to read it directly from the RHS (after updating it by storing in it > its own components). If there are some necessary unscalarized data in > the LHS, those will be loaded by the original assignment too. If > neither of these cases happen, the original statement can be removed. > Most of this is done by load_assign_lhs_subreplacements. > > In the second case, we would like to store all RHS scalarized > components directly into LHS and if they cover the aggregate > completely, remove the statement too. In the third case, we want the > LHS components to be loaded directly from the RHS (DSE will remove the > original statement if it becomes redundant). > > This is a bit complex but manageable when types match and when unions > do not cause confusion in a way that we cannot really load a component > of LHS from the RHS or vice versa (the access representing this level > can have subaccesses that are accessible only through a different > union field at a higher level - different from the one used in the > examined expression). Unions are fun. > > Therefore, I specially handle a fourth case, happening when there is a > specific type cast or it is impossible to locate a scalarized > subaccess on the other side of the expression. If that happens, I > simply "refresh" the RHS by storing in it is scalarized components > leave the original statement there to do the copying and then load the > scalar replacements of the LHS. This is what the first branch does. > > Is it clearer now? Perhaps I should put these five paragraphs as a > comment into the function? Yes - that would be nice. Thanks, Richard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-05-10 11:48 ` Richard Guenther @ 2009-05-12 0:24 ` Martin Jambor 2009-05-18 13:26 ` Richard Guenther 0 siblings, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-05-12 0:24 UTC (permalink / raw) To: Richard Guenther; +Cc: GCC Patches, Jan Hubicka Hi, thanks for a quick reply. Some clarifications below: On Sun, May 10, 2009 at 01:48:01PM +0200, Richard Guenther wrote: > On Sun, 10 May 2009, Martin Jambor wrote: > > > > > expr = TREE_OPERAND (expr, 0); > > > > bit_ref = true; > > > > } > > > > else > > > > bit_ref = false; > > > > > > > > while (TREE_CODE (expr) == NOP_EXPR > > > > > > CONVERT_EXPR_P (expr) > > > > OK... but at another place in the email you said it might not even > > appear in a valid gimple statement? Should I remove it altogether? > > Indeed. If you not build trees from tuple stmts then a > NOP_EXPR cannot appear as a rhs1 or rhs2 of an assignment (instead > it is always the subcode in gimple stmt and the rhs1 is simply sth > valid for is_gimple_val). OK > > > > > || TREE_CODE (expr) == VIEW_CONVERT_EXPR > > > > || TREE_CODE (expr) == REALPART_EXPR > > > > || TREE_CODE (expr) == IMAGPART_EXPR) > > > > expr = TREE_OPERAND (expr, 0); > > > > > > Why do this here btw, and not just lump ... > > > > > > > switch (TREE_CODE (expr)) > > > > { > > > > case ADDR_EXPR: > > > > case SSA_NAME: > > > > case INDIRECT_REF: > > > > break; > > > > > > > > case VAR_DECL: > > > > case PARM_DECL: > > > > case RESULT_DECL: > > > > case COMPONENT_REF: > > > > case ARRAY_REF: > > > > ret = create_access (expr, write); > > > > break; > > > > > > ... this ... > > > > > > > case REALPART_EXPR: > > > > case IMAGPART_EXPR: > > > > expr = TREE_OPERAND (expr, 0); > > > > ret = create_access (expr, write); > > > > > > ... and this together? Won't you create bogus accesses if you > > > strip for example IMAGPART_EXPR (which has non-zero offset)? > > > > That would break the complex number into its components. I thought > > that they are meant to stay together for some reason, otherwise they > > would not be represented explicitly in gimple... do you think it does > > not matter? What about vectors then? > > > > The access is not bogus because modification functions take care of > > these statements in a special way. However, if it is indeed OK to > > split complex numbers into their components, I will gladly simplify > > this as you suggested. > > Yes, it is valid to split them (and complex lowering indeed does that). > It _might_ be useful to keep a complex together in a single SSA_NAME > for optimization purposes, but I guess you detect that anyway if there > is a read of the whole complex element into a register and keep it > that way. > > I would favor simplifying SRA in this case and just split them if > that is valid. OK, I will try to do that and incormporate it into the patch if it indeed simplifies things. At least analysis of access trees will OTOH become more complex. Unfortunately, I need to concentrate on another two things this week so I'll do that early the next one. > > > > return SRA_SA_NONE; > > > > > > > > lhs_ptr = gimple_assign_lhs_ptr (stmt); > > > > rhs_ptr = gimple_assign_rhs1_ptr (stmt); > > > > > > you probably don't need to pass pointers to trees everywhere as you > > > are not changing them. > > > > Well, this function is a callback called by scan_function which can > > also call sra_modify_expr in the last stage of the pass when > > statements are modified. I have considered splitting the function > > into two but in the end I thought they would be too similar and the > > overhead is hopefully manageable. > > Yeah, I noticed this later. It is somewhat confusing at first sight, > so maybe just amending the comment before this function could > clarify things. OK, for my part, I've realized that build_access_from_expr_1 indeed does not use the gsi parameter (and is not a callback, unlike build_access_from_expr). > > > > if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr)) > > > > return SRA_SA_NONE; > > > > > > > > racc = build_access_from_expr_1 (rhs_ptr, gsi, false); > > > > lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); > > > > > > just avoid calling into build_access_from_expr_1 for SSA_NAMEs > > > or is_gimple_min_invariant lhs/rhs, that should make that > > > function more regular. > > > > In what sense? build_access_from_expr_1 looks at TREE_CODE anyway and > > can discard the two cases, without for example looking into ADR_EXPRs > > like is_gimple_min_invariant(). > > > > But if you really think it is indeed beneficial, I can do that, sure - > > to me it just looks ugly). > > Ok, just keep it as is. > > > > > if (lacc && racc > > > > && !lacc->grp_unscalarizable_region > > > > && !racc->grp_unscalarizable_region > > > > && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) > > > > && lacc->size <= racc->size > > > > && useless_type_conversion_p (lacc->type, racc->type)) > > > > > > useless_type_conversion_p should be always true here. > > > > I don't think so, build_access_from_expr_1 can look through V_C_Es and > > the types of accesses are the type of the operand in such cases.. > > Ok, but what is the point of looking through V_C_Es there if it makes > this test fail? Hmm, IIRC this was only to track struct copies, right? > I guess it's ok then. Yes, only for that. The point of looking through V_C_Es is the size incosistency. > > > > That would just be useless information. I guess you copied this > > > from old SRA? > > > > Yes. All this fancy naming stuff is quite useless but I find it very > > handy when debugging SRA issues. > > Yeah, sort of. Still using no name in that case will do exactly > the same thing ;) > > > > > static tree > > > > create_access_replacement (struct access *access) > > > > { > > > > tree repl; > > > > > > > > repl = make_rename_temp (access->type, "SR"); > > > > get_var_ann (repl); > > > > add_referenced_var (repl); > > > > > > > > DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); > > > > DECL_ARTIFICIAL (repl) = 1; > > > > > > > > if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) > > > > > > at least && !DECL_ARTIFICIAL (access->base) I think. > > > > This part is also largely copied from the old SRA. So far it seems to > > work nicely, replacements of artificial declarations get SRsome_number > > fancy names and that makes them easy to distinguish. Nevertheless, I > > can change the condition if it is somehow wrong. Or do you expect any > > other problems beside not-so-fancy fancy names? > > No, it merely uses up memory. Not that other passes do not do this ... > > Thus, I probably do not care too much. Well, I'd like to have the fancy names when SRA gets merged and new issues are likely to come up. We can always remove this later on. > > > > if (access->grp_bfr_lhs) > > > > DECL_GIMPLE_REG_P (repl) = 0; > > > > > > But you never set it (see update_address_taken for more cases, > > > most notably VIEW_CONVERT_EXPR on the lhs which need to be taken > > > care of). You should set it for COMPLEX_TYPE and VECTOR_TYPE > > > replacements. > > > > This function is the only place where I still use make_rename_temp > > which sets it exactly in these two cases. I did not really know why > > it is required in these two cases and only in these two cases so I > > left it there, at least for now. I guess I understand that now after > > seeing update_address_taken. > > > > I can replace this with calling create_tmp_var() and doing all the > > rest that make_rename_temp does - I believe that you intend to to > > remove it - I have just not found out why it is so bad. > > The bad thing about it is that it supports using the SSA renamer > to write a single variable into SSA. That is usually more costly > than just manually allocating SSA_NAMEs and updating SSA form, > which is usally very easy. > > It's not used much, in which case the easiest thing might be to > fix all remaining uses to manually update SSA form. > > But yes, I now see why that zeroing is necessary. OK, I'll use create_tmp_var_here too. But at this point I cannot create SSA_NAMEs manually and will basically have to do all that make_rename_temp does. > > > > > > CONVERT_EXPR_P (expr) > > > > > > > || TREE_CODE (expr) == VIEW_CONVERT_EXPR) > > > > > > VIEW_CONVERT_EXPR is also a handled_component_p. > > > > > > Note that NOP_EXPR should never occur here - that would be invalid > > > gimple. So I think you can (and should) just delete the above. > > > > I haven't seen a NOP_EXPR for a while, do they still exist in lower > > gimple? Thus I have removed their handling. > > > > Removing diving through V_C_E breaks ADA, though. The reason is that > > we get a different size (and max_size) when calling > > get_ref_base_and_extent on the V_C_E and on its argument. However, I > > believe both should be represented by a single access representative. > > Yeah, I remember this :/ It is technically invalid GIMPLE that the > Ada FE generates though. The size of the V_C_E result has to match > that of the operand. > > Please add a FIXME before this stripping refering to the Ada problem. OK > > > > tree repl = get_access_replacement (access); > > > > if (!TREE_ADDRESSABLE (type)) > > > > { > > > > tree tmp = create_tmp_var (type, "SRvce"); > > > > > > > > add_referenced_var (tmp); > > > > if (is_gimple_reg_type (type)) > > > > tmp = make_ssa_name (tmp, NULL); > > > > > > Should be always is_gimple_reg_type () if it is a type suitable for > > > a SRA scalar replacement. > > > > No, it is the type suitable for the statement, it can be a union type > > or a record with only one field. But see the more thorough explanation > > below... > > I think it should be always a register type, but see below... ;) > > > > But you should set DECL_GIMPLE_REG_P for > > > VECTOR and COMPLEX types here. > > > > > > > if (write) > > > > { > > > > gimple stmt; > > > > tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); > > > > > > This needs to either always fold to plain 'tmp' or tmp has to be a > > > non-register. Otherwise you will create invalid gimple. > > > > > > > *expr = tmp; > > > > if (is_gimple_reg_type (type)) > > > > SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); > > > > > > See above. > > > > > > > stmt = gimple_build_assign (repl, conv); > > > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > > > update_stmt (stmt); > > > > } > > > > else > > > > { > > > > gimple stmt; > > > > tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); > > > > > > > > stmt = gimple_build_assign (tmp, conv); > > > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > > > if (is_gimple_reg_type (type)) > > > > SSA_NAME_DEF_STMT (tmp) = stmt; > > > > > > See above. (I wonder if the patch still passes bootstrap & regtest > > > after the typecking patch) > > > > > > > *expr = tmp; > > > > update_stmt (stmt); > > > > } > > > > } > > > > else > > > > { > > > > if (write) > > > > { > > > > gimple stmt; > > > > > > > > stmt = gimple_build_assign (repl, unshare_expr (access->expr)); > > > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > > > update_stmt (stmt); > > > > } > > > > else > > > > { > > > > gimple stmt; > > > > > > > > stmt = gimple_build_assign (unshare_expr (access->expr), repl); > > > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > > > update_stmt (stmt); > > > > } > > > > > > I don't understand this path. Are the types here always compatible? > > > > And I don't really understand the comments. The function is called by > > sra_modify_expr (the function doing the replacements in all non-assign > > statements) when it needs to replace a reference by a scalar but the > > types don't match. This can happen when replacing a V_C_E, a union > > access when we picked a different type that the one used in the > > statement or (and this case can be remarkably irritating) an access to > > a records with only one (scalar) field. > > > > My original idea was to simply put a V_C_E in the place. However, I > > believe there are places where this is not possible - or at least one > > case, a LHS of a call statement because V_C_Es of gimple_registers > > (ssa_names) are not allowed on LHSs. My initial idea to handle these > > cases were to create a new temporary with a matching and a V_C_E > > assign statement (with the V_C_E always on the RHS - I believe that > > works even with gimple registers) that would do the conversion and > > load/store it to the replacement variable (this is what the > > !TREE_ADDRESSABLE branch does). > > > > The problem with this idea are TREE_ADDRESSABLE types. These types > > need to be constructed and thus we cannot create temporary variables > > of these types. On the other hand they absolutely need to be SRAed, > > not doing so slows down tramp3d by a factor of two (and the current > > SRA also breaks them up). And quite a few C++ classes are such types > > that are "non-addressable" and have only one scalar field. > > Identifying such records is possible, I soon realized that I can > > simply leave the statement as it is and produce a new statement to do > > load/store from the original field (that's what the outermost else > > branch does). > > > > Does this make sense or is there some fundamental flaw in my reasoning > > about gimple again? Does this explain what the function does? > > Ok, so the case in question is > > struct X { int i; } x; > x = foo (); > > where you want to scalarize x. Indeed the obvious scalarization would > be > > x = foo (); > SR_1 = x.i; > > For all other LHS cases (not calls) you can move the V_C_E to the RHS > and should be fine. Well, I have tried to remove the function and have sra_modify_expr handle this particular case only to discover another one where it's probably required (given that gimple verification is correct). The new problem is that I cannot put a V_C_E as an operand of GIMPLE_RETURN. Specifically I got an "invalid operand in return statement" failure when verifying: return VIEW_CONVERT_EXPR<struct gnat__perfect_hash_generators__key_type>(SR.2108_12); Thus I will probably keep the function to be always safe but change it to the following: /* Substitute into *EXPR an expression of type TYPE with the value of the replacement of ACCESS. This is done either by producing a special V_C_E assignment statement converting the replacement to a new temporary of the requested type if TYPE is is_gimple_reg_type or by going through the base aggregate if it is not. */ static void fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, gimple_stmt_iterator *gsi, bool write) { tree repl = get_access_replacement (access); if (is_gimple_reg_type (type)) { tree tmp = create_tmp_var (type, "SRvce"); add_referenced_var (tmp); tmp = make_ssa_name (tmp, NULL); if (write) { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); *expr = tmp; SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); stmt = gimple_build_assign (repl, conv); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); stmt = gimple_build_assign (tmp, conv); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); SSA_NAME_DEF_STMT (tmp) = stmt; *expr = tmp; update_stmt (stmt); } } else { if (write) { gimple stmt; stmt = gimple_build_assign (repl, unshare_expr (access->expr)); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; stmt = gimple_build_assign (unshare_expr (access->expr), repl); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } } } > I don't understand the TREE_ADDRESSABLE thingy yet. Especially why > the type should be a non-register type. I guess it does not matter now, but in your example above type would be struct x, not int. > > It certainly passes bootstrap and testing, I use --enable-checking=yes. > > That's good. > > > > > { > > > > /* I have never seen this code path trigger but if it can happen the > > > > following should handle it gracefully. */ > > > > > > It can trigger for vector constants. > > > > OK, I'll remove the comment. Apparently there are none in the > > testsuite, I believe I tested with a gcc_unreachable here. > > Err, for vector constants we have VECTOR_CST, so it triggers for > non-constant vector constructors like > > vector int x = { a, b, c, d }; I see. > > > > update_stmt (aux_stmt); > > > > new_stmt = gimple_build_assign (get_access_replacement (access), > > > > fold_build2 (COMPLEX_EXPR, access->type, > > > > rp, ip)); > > > > gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); > > > > update_stmt (new_stmt); > > > > > > Hm. So you do what complex lowering does here. Note that this may > > > create loads from uninitialized memory with all its problems. > > > > Yes, but I have not had any such problems with complex types (as > > opposed to simple loads from half-initialized records, for example). > > OTOH, I have also contemplated setting DECL_GIMPLE_REG_P to zero of > > complex replacement which appear in IMAG_PART or REAL_PART on a LHS of > > a statement. > > Yes, that's necessary. I still think SRA should not bother about > this at all ;) > > > > WRT the complex stuff. If you would do scalarization and analysis > > > just on the components (not special case REAL/IMAGPART_EXPR everywhere) > > > it should work better, correct? You still could handle group > > > scalarization for the case of for example passing a complex argument > > > to a function. > > > > Well, my reasoning was that if complex types were first-class citizens > > in gimple (as opposed to a record), there was a reason to keep them > > together and so I attempted that. But again, if that is a > > misconception of mine and there is no point in keeping them together, > > I will gladly remove this. > > It's not clear. Complex lowering decomposes all complex variables > to components if possible. Again, simplifying SRA is probably better. OK, I will find out how much this would actually simplify things and what new problems might arise. I have already tried relaxing the access tree analysis s that it does not prevent scalarization of subaccesses of scalar accesses which would be necessary for decomposing complex components and uncovered an unrelated problem, again in my favorite testcase entry_4.f90. Can you please check whether the following snippet is a valid gimple? The expander ICEs on an assert when trying to crunch the V_C_E at the end. Looking at it myself, I start to doubt that I can always handle union type-punning with V_C_Es. ---------------------------------------------------------------------- master.2.f3 (integer(kind=8) __entry, integer(kind=4) * b, integer(kind=4) * a) { real(kind=4) __result_master.2.f3$r; union munion.2.f3 __result_master.2.f3; union munion.2.f3 D.1641; complex(kind=4) D.1640; real(kind=4) D.1639; integer(kind=4) D.1638; logical(kind=4) D.1637; integer(kind=4) D.1636; real(kind=4) D.1635; integer(kind=4) D.1634; integer(kind=4) D.1633; <bb 2>: switch (__entry_1(D)) <default: <L3>, case 0: <L3>, case 1: <L10>, case 2: <L9>> <L3>: D.1633_3 = *a_2(D); D.1634_4 = D.1633_3 + 15; D.1635_5 = (real(kind=4)) D.1634_4; __result_master.2.f3$r_15 = D.1635_5; goto <bb 6>; <L10>: D.1636_7 = *b_6(D); D.1637_8 = D.1636_7 == 42; __result_master.2.f3$r_19 = VIEW_CONVERT_EXPR<real(kind=4)>(D.1637_8); ---------------------------------------------------------------------- > > > > void bar(_Complex float); > > > void foo(float x, float y) > > > { > > > _Complex float z = x; > > > __imag z = y; > > > bar(z); > > > } > > > > > > The same applies for vectors - the REAL/IMAGPART_EXPRs equivalent > > > there is BIT_FIELD_REF. > > > > These are handled by setting DECL_GIMPLE_REG_P to zero if a B_F_R is > > on a LHS. I believe the current SRA does the same. It works fine and > > there's a lot less fuss about them. > > > > > > return SRA_SA_PROCESSED; > > > > } > > > > > > > > /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components. */ > > > > > > > > static bool > > > > contains_view_convert_expr_p (tree t) > > > > { > > > > while (1) > > > > { > > > > if (TREE_CODE (t) == VIEW_CONVERT_EXPR) > > > > return true; > > > > if (!handled_component_p (t)) > > > > return false; > > > > t = TREE_OPERAND (t, 0); > > > > } > > > > } > > > > > > Place this in tree-flow-inline.h next to ref_contains_array_ref, also > > > structure the loop in the same way. > > > > OK, but I'd like the function to work if passed declarations too. > > Thus I cannot really use a do-while loop. I'll send it in a separate > > patch. > > > > > > /* Change STMT to assign compatible types by means of adding component or array > > > > references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as > > > > variable with the same names in sra_modify_assign. This is done in a > > > > such a complicated way in order to make > > > > testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some > > > > cases. */ > > > > > > > > static void > > > > fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt, > > > > struct access *lacc, struct access *racc, > > > > tree lhs, tree *rhs, tree ltype, tree rtype) > > > > { > > > > if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) > > > > && !access_has_children_p (lacc)) > > > > { > > > > tree expr = unshare_expr (lhs); > > > > bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, > > > > false); > > > > if (found) > > > > { > > > > gimple_assign_set_lhs (*stmt, expr); > > > > return; > > > > } > > > > } > > > > > > > > if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) > > > > && !access_has_children_p (racc)) > > > > { > > > > tree expr = unshare_expr (*rhs); > > > > bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, > > > > false); > > > > if (found) > > > > { > > > > gimple_assign_set_rhs1 (*stmt, expr); > > > > return; > > > > } > > > > } > > > > > > > > *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); > > > > gimple_assign_set_rhs_from_tree (gsi, *rhs); > > > > *stmt = gsi_stmt (*gsi); > > > > > > Reading this I have a deja-vu - isn't there another function in this > > > file doing the same thing? You are doing much unsharing even though > > > you re-build the access tree from scratch? > > > > This function has a similar purpose as fix_incompatible_types_for_expr > > but this time only for assign statements. That is easier because we > > can always put the V_C_E on the RHS and be safe and so no additional > > statements need to be generated. > > > > However, the V_C_Es rather than COMPONENT_REFs and ARRAY_REFs feel > > unnatural for accessing fields from single field records and unions > > and single element arrays. According to the comment I used to have > > problems of some sort with that in the ssa-sra-2.C testcase but I can > > no longer reproduce them (and don't remember them). > > > > I call unshare_expr in this context only when one side of an > > assignment statement is a scalar replacement and the other one is an > > aggregate (but not necessarily a declaration) which can happen only in > > the cases listed above. That is not very many calls and chances are > > good that build_ref_for_offset succeeds. > > > > Does that explain what is going on here? > > Yes. I guess merging the two functions would make sense to me > (or putting them next to each other at least). The function signatures > also look seemingly weird (that you store into *rhs but still set > the stmt rhs, etc.). I have another look here with the new version. *rhs is a local variable of the caller, not a pointer in the statement. (It has only one callsite and thus this double indirection will be inlined away). I tried using gimple_assign_set_rhs1 instead but it ICEs when passed a V_C_E. The two functions have similar goal but work differently, in particular fix_modified_assign_compatibility never generates new statements while fix_incompatible_types_for_expr always does. So I don't think it makes sense to merge them. But I can move this one upward, sure. > > > > } > > > > > > > > /* Callback of scan_function to process assign statements. It examines both > > > > sides of the statement, replaces them with a scalare replacement if there is > > > > one and generating copying of replacements if scalarized aggregates have been > > > > used in the assignment. STMT is a pointer to the assign statement, GSI is > > > > used to hold generated statements for type conversions and subtree > > > > copying. */ > > > > > > > > static enum scan_assign_result > > > > sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, > > > > void *data ATTRIBUTE_UNUSED) > > > > { > > > > struct access *lacc, *racc; > > > > tree ltype, rtype; > > > > tree lhs, rhs; > > > > bool modify_this_stmt; > > > > > > > > if (gimple_assign_rhs2 (*stmt)) > > > > > > !gimple_assign_single_p (*stmt) > > > > > > (the only gimple assign that may access memory) > > > > OK > > > > > > > > > return SRA_SA_NONE; > > > > lhs = gimple_assign_lhs (*stmt); > > > > rhs = gimple_assign_rhs1 (*stmt); > > > > > > > > if (TREE_CODE (rhs) == CONSTRUCTOR) > > > > return sra_modify_constructor_assign (stmt, gsi); > > > > > > > > if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) > > > > return sra_modify_partially_complex_lhs (*stmt, gsi); > > > > > > > > if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR > > > > || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) > > > > { > > > > modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), > > > > gsi, false, data); > > > > modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), > > > > gsi, true, data); > > > > return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; > > > > } > > > > > > > > lacc = get_access_for_expr (lhs); > > > > racc = get_access_for_expr (rhs); > > > > if (!lacc && !racc) > > > > return SRA_SA_NONE; > > > > > > > > modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) > > > > || (racc && racc->grp_to_be_replaced)); > > > > > > > > if (lacc && lacc->grp_to_be_replaced) > > > > { > > > > lhs = get_access_replacement (lacc); > > > > gimple_assign_set_lhs (*stmt, lhs); > > > > ltype = lacc->type; > > > > } > > > > else > > > > ltype = TREE_TYPE (lhs); > > > > > > > > if (racc && racc->grp_to_be_replaced) > > > > { > > > > rhs = get_access_replacement (racc); > > > > gimple_assign_set_rhs1 (*stmt, rhs); > > > > rtype = racc->type; > > > > } > > > > else > > > > rtype = TREE_TYPE (rhs); > > > > > > > > /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate > > > > the statement makes the position of this pop_stmt_changes() a bit awkward > > > > but hopefully make some sense. */ > > > > > > I don't see pop_stmt_changes(). > > > > Yeah, the comment is outdated. I've removed it. > > > > > > if (modify_this_stmt) > > > > { > > > > if (!useless_type_conversion_p (ltype, rtype)) > > > > fix_modified_assign_compatibility (gsi, stmt, lacc, racc, > > > > lhs, &rhs, ltype, rtype); > > > > } > > > > > > > > if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) > > > > || (access_has_children_p (racc) > > > > && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) > > > > || (access_has_children_p (lacc) > > > > && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) > > > > > > ? A comment is missing what this case is about ... > > > > > > (this smells like fixup that could be avoided by doing things correct > > > in the first place) > > > > From this point on, the function deals with assignments in between > > aggregates when at least one has scalar reductions of some of its > > components. There are three possible scenarios: Both the LHS and RHS > > have to-be-scalarized components, 2) only the RHS has or 3) only the > > LHS has. > > > > In the first case, we would like to load the LHS components from RHS > > components whenever possible. If that is not possible, we would like > > to read it directly from the RHS (after updating it by storing in it > > its own components). If there are some necessary unscalarized data in > > the LHS, those will be loaded by the original assignment too. If > > neither of these cases happen, the original statement can be removed. > > Most of this is done by load_assign_lhs_subreplacements. > > > > In the second case, we would like to store all RHS scalarized > > components directly into LHS and if they cover the aggregate > > completely, remove the statement too. In the third case, we want the > > LHS components to be loaded directly from the RHS (DSE will remove the > > original statement if it becomes redundant). > > > > This is a bit complex but manageable when types match and when unions > > do not cause confusion in a way that we cannot really load a component > > of LHS from the RHS or vice versa (the access representing this level > > can have subaccesses that are accessible only through a different > > union field at a higher level - different from the one used in the > > examined expression). Unions are fun. > > > > Therefore, I specially handle a fourth case, happening when there is a > > specific type cast or it is impossible to locate a scalarized > > subaccess on the other side of the expression. If that happens, I > > simply "refresh" the RHS by storing in it is scalarized components > > leave the original statement there to do the copying and then load the > > scalar replacements of the LHS. This is what the first branch does. > > > > Is it clearer now? Perhaps I should put these five paragraphs as a > > comment into the function? > > Yes - that would be nice. > I have just started a bootstrap of my latest version (still keeping complex numbers together). If it passes, I'll mail it tomorrow (otherwise, I'll have to wait for a couple of days). Thanks again, Martin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-05-12 0:24 ` Martin Jambor @ 2009-05-18 13:26 ` Richard Guenther 0 siblings, 0 replies; 25+ messages in thread From: Richard Guenther @ 2009-05-18 13:26 UTC (permalink / raw) To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka On Tue, 12 May 2009, Martin Jambor wrote: > Hi, > > thanks for a quick reply. Some clarifications below: > > On Sun, May 10, 2009 at 01:48:01PM +0200, Richard Guenther wrote: > > On Sun, 10 May 2009, Martin Jambor wrote: > > > > > > > expr = TREE_OPERAND (expr, 0); > > > > > bit_ref = true; > > > > > } > > > > > else > > > > > bit_ref = false; > > > > > > > > > > while (TREE_CODE (expr) == NOP_EXPR > > > > > > > > CONVERT_EXPR_P (expr) > > > > > > OK... but at another place in the email you said it might not even > > > appear in a valid gimple statement? Should I remove it altogether? > > > > Indeed. If you not build trees from tuple stmts then a > > NOP_EXPR cannot appear as a rhs1 or rhs2 of an assignment (instead > > it is always the subcode in gimple stmt and the rhs1 is simply sth > > valid for is_gimple_val). > > OK > > > > > > > > || TREE_CODE (expr) == VIEW_CONVERT_EXPR > > > > > || TREE_CODE (expr) == REALPART_EXPR > > > > > || TREE_CODE (expr) == IMAGPART_EXPR) > > > > > expr = TREE_OPERAND (expr, 0); > > > > > > > > Why do this here btw, and not just lump ... > > > > > > > > > switch (TREE_CODE (expr)) > > > > > { > > > > > case ADDR_EXPR: > > > > > case SSA_NAME: > > > > > case INDIRECT_REF: > > > > > break; > > > > > > > > > > case VAR_DECL: > > > > > case PARM_DECL: > > > > > case RESULT_DECL: > > > > > case COMPONENT_REF: > > > > > case ARRAY_REF: > > > > > ret = create_access (expr, write); > > > > > break; > > > > > > > > ... this ... > > > > > > > > > case REALPART_EXPR: > > > > > case IMAGPART_EXPR: > > > > > expr = TREE_OPERAND (expr, 0); > > > > > ret = create_access (expr, write); > > > > > > > > ... and this together? Won't you create bogus accesses if you > > > > strip for example IMAGPART_EXPR (which has non-zero offset)? > > > > > > That would break the complex number into its components. I thought > > > that they are meant to stay together for some reason, otherwise they > > > would not be represented explicitly in gimple... do you think it does > > > not matter? What about vectors then? > > > > > > The access is not bogus because modification functions take care of > > > these statements in a special way. However, if it is indeed OK to > > > split complex numbers into their components, I will gladly simplify > > > this as you suggested. > > > > Yes, it is valid to split them (and complex lowering indeed does that). > > It _might_ be useful to keep a complex together in a single SSA_NAME > > for optimization purposes, but I guess you detect that anyway if there > > is a read of the whole complex element into a register and keep it > > that way. > > > > I would favor simplifying SRA in this case and just split them if > > that is valid. > > OK, I will try to do that and incormporate it into the patch if it > indeed simplifies things. At least analysis of access trees will OTOH > become more complex. Unfortunately, I need to concentrate on another > two things this week so I'll do that early the next one. Ok. > > > > > return SRA_SA_NONE; > > > > > > > > > > lhs_ptr = gimple_assign_lhs_ptr (stmt); > > > > > rhs_ptr = gimple_assign_rhs1_ptr (stmt); > > > > > > > > you probably don't need to pass pointers to trees everywhere as you > > > > are not changing them. > > > > > > Well, this function is a callback called by scan_function which can > > > also call sra_modify_expr in the last stage of the pass when > > > statements are modified. I have considered splitting the function > > > into two but in the end I thought they would be too similar and the > > > overhead is hopefully manageable. > > > > Yeah, I noticed this later. It is somewhat confusing at first sight, > > so maybe just amending the comment before this function could > > clarify things. > > OK, for my part, I've realized that build_access_from_expr_1 indeed > does not use the gsi parameter (and is not a callback, unlike > build_access_from_expr). > > > > > > if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr)) > > > > > return SRA_SA_NONE; > > > > > > > > > > racc = build_access_from_expr_1 (rhs_ptr, gsi, false); > > > > > lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); > > > > > > > > just avoid calling into build_access_from_expr_1 for SSA_NAMEs > > > > or is_gimple_min_invariant lhs/rhs, that should make that > > > > function more regular. > > > > > > In what sense? build_access_from_expr_1 looks at TREE_CODE anyway and > > > can discard the two cases, without for example looking into ADR_EXPRs > > > like is_gimple_min_invariant(). > > > > > > But if you really think it is indeed beneficial, I can do that, sure - > > > to me it just looks ugly). > > > > Ok, just keep it as is. > > > > > > > if (lacc && racc > > > > > && !lacc->grp_unscalarizable_region > > > > > && !racc->grp_unscalarizable_region > > > > > && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) > > > > > && lacc->size <= racc->size > > > > > && useless_type_conversion_p (lacc->type, racc->type)) > > > > > > > > useless_type_conversion_p should be always true here. > > > > > > I don't think so, build_access_from_expr_1 can look through V_C_Es and > > > the types of accesses are the type of the operand in such cases.. > > > > Ok, but what is the point of looking through V_C_Es there if it makes > > this test fail? Hmm, IIRC this was only to track struct copies, right? > > I guess it's ok then. > > Yes, only for that. The point of looking through V_C_Es is the size > incosistency. > > > > > > > That would just be useless information. I guess you copied this > > > > from old SRA? > > > > > > Yes. All this fancy naming stuff is quite useless but I find it very > > > handy when debugging SRA issues. > > > > Yeah, sort of. Still using no name in that case will do exactly > > the same thing ;) > > > > > > > static tree > > > > > create_access_replacement (struct access *access) > > > > > { > > > > > tree repl; > > > > > > > > > > repl = make_rename_temp (access->type, "SR"); > > > > > get_var_ann (repl); > > > > > add_referenced_var (repl); > > > > > > > > > > DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); > > > > > DECL_ARTIFICIAL (repl) = 1; > > > > > > > > > > if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) > > > > > > > > at least && !DECL_ARTIFICIAL (access->base) I think. > > > > > > This part is also largely copied from the old SRA. So far it seems to > > > work nicely, replacements of artificial declarations get SRsome_number > > > fancy names and that makes them easy to distinguish. Nevertheless, I > > > can change the condition if it is somehow wrong. Or do you expect any > > > other problems beside not-so-fancy fancy names? > > > > No, it merely uses up memory. Not that other passes do not do this ... > > > > Thus, I probably do not care too much. > > Well, I'd like to have the fancy names when SRA gets merged and new > issues are likely to come up. We can always remove this later on. Ok. > > > > > > if (access->grp_bfr_lhs) > > > > > DECL_GIMPLE_REG_P (repl) = 0; > > > > > > > > But you never set it (see update_address_taken for more cases, > > > > most notably VIEW_CONVERT_EXPR on the lhs which need to be taken > > > > care of). You should set it for COMPLEX_TYPE and VECTOR_TYPE > > > > replacements. > > > > > > This function is the only place where I still use make_rename_temp > > > which sets it exactly in these two cases. I did not really know why > > > it is required in these two cases and only in these two cases so I > > > left it there, at least for now. I guess I understand that now after > > > seeing update_address_taken. > > > > > > I can replace this with calling create_tmp_var() and doing all the > > > rest that make_rename_temp does - I believe that you intend to to > > > remove it - I have just not found out why it is so bad. > > > > The bad thing about it is that it supports using the SSA renamer > > to write a single variable into SSA. That is usually more costly > > than just manually allocating SSA_NAMEs and updating SSA form, > > which is usally very easy. > > > > It's not used much, in which case the easiest thing might be to > > fix all remaining uses to manually update SSA form. > > > > But yes, I now see why that zeroing is necessary. > > OK, I'll use create_tmp_var_here too. But at this point I cannot > create SSA_NAMEs manually and will basically have to do all that > make_rename_temp does. Ok. > > > > > > > > CONVERT_EXPR_P (expr) > > > > > > > > > || TREE_CODE (expr) == VIEW_CONVERT_EXPR) > > > > > > > > VIEW_CONVERT_EXPR is also a handled_component_p. > > > > > > > > Note that NOP_EXPR should never occur here - that would be invalid > > > > gimple. So I think you can (and should) just delete the above. > > > > > > I haven't seen a NOP_EXPR for a while, do they still exist in lower > > > gimple? Thus I have removed their handling. > > > > > > Removing diving through V_C_E breaks ADA, though. The reason is that > > > we get a different size (and max_size) when calling > > > get_ref_base_and_extent on the V_C_E and on its argument. However, I > > > believe both should be represented by a single access representative. > > > > Yeah, I remember this :/ It is technically invalid GIMPLE that the > > Ada FE generates though. The size of the V_C_E result has to match > > that of the operand. > > > > Please add a FIXME before this stripping refering to the Ada problem. > > OK > > > > > > tree repl = get_access_replacement (access); > > > > > if (!TREE_ADDRESSABLE (type)) > > > > > { > > > > > tree tmp = create_tmp_var (type, "SRvce"); > > > > > > > > > > add_referenced_var (tmp); > > > > > if (is_gimple_reg_type (type)) > > > > > tmp = make_ssa_name (tmp, NULL); > > > > > > > > Should be always is_gimple_reg_type () if it is a type suitable for > > > > a SRA scalar replacement. > > > > > > No, it is the type suitable for the statement, it can be a union type > > > or a record with only one field. But see the more thorough explanation > > > below... > > > > I think it should be always a register type, but see below... ;) > > > > > > But you should set DECL_GIMPLE_REG_P for > > > > VECTOR and COMPLEX types here. > > > > > > > > > if (write) > > > > > { > > > > > gimple stmt; > > > > > tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); > > > > > > > > This needs to either always fold to plain 'tmp' or tmp has to be a > > > > non-register. Otherwise you will create invalid gimple. > > > > > > > > > *expr = tmp; > > > > > if (is_gimple_reg_type (type)) > > > > > SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); > > > > > > > > See above. > > > > > > > > > stmt = gimple_build_assign (repl, conv); > > > > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > > > > update_stmt (stmt); > > > > > } > > > > > else > > > > > { > > > > > gimple stmt; > > > > > tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); > > > > > > > > > > stmt = gimple_build_assign (tmp, conv); > > > > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > > > > if (is_gimple_reg_type (type)) > > > > > SSA_NAME_DEF_STMT (tmp) = stmt; > > > > > > > > See above. (I wonder if the patch still passes bootstrap & regtest > > > > after the typecking patch) > > > > > > > > > *expr = tmp; > > > > > update_stmt (stmt); > > > > > } > > > > > } > > > > > else > > > > > { > > > > > if (write) > > > > > { > > > > > gimple stmt; > > > > > > > > > > stmt = gimple_build_assign (repl, unshare_expr (access->expr)); > > > > > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > > > > > update_stmt (stmt); > > > > > } > > > > > else > > > > > { > > > > > gimple stmt; > > > > > > > > > > stmt = gimple_build_assign (unshare_expr (access->expr), repl); > > > > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > > > > update_stmt (stmt); > > > > > } > > > > > > > > I don't understand this path. Are the types here always compatible? > > > > > > And I don't really understand the comments. The function is called by > > > sra_modify_expr (the function doing the replacements in all non-assign > > > statements) when it needs to replace a reference by a scalar but the > > > types don't match. This can happen when replacing a V_C_E, a union > > > access when we picked a different type that the one used in the > > > statement or (and this case can be remarkably irritating) an access to > > > a records with only one (scalar) field. > > > > > > My original idea was to simply put a V_C_E in the place. However, I > > > believe there are places where this is not possible - or at least one > > > case, a LHS of a call statement because V_C_Es of gimple_registers > > > (ssa_names) are not allowed on LHSs. My initial idea to handle these > > > cases were to create a new temporary with a matching and a V_C_E > > > assign statement (with the V_C_E always on the RHS - I believe that > > > works even with gimple registers) that would do the conversion and > > > load/store it to the replacement variable (this is what the > > > !TREE_ADDRESSABLE branch does). > > > > > > The problem with this idea are TREE_ADDRESSABLE types. These types > > > need to be constructed and thus we cannot create temporary variables > > > of these types. On the other hand they absolutely need to be SRAed, > > > not doing so slows down tramp3d by a factor of two (and the current > > > SRA also breaks them up). And quite a few C++ classes are such types > > > that are "non-addressable" and have only one scalar field. > > > Identifying such records is possible, I soon realized that I can > > > simply leave the statement as it is and produce a new statement to do > > > load/store from the original field (that's what the outermost else > > > branch does). > > > > > > Does this make sense or is there some fundamental flaw in my reasoning > > > about gimple again? Does this explain what the function does? > > > > Ok, so the case in question is > > > > struct X { int i; } x; > > x = foo (); > > > > where you want to scalarize x. Indeed the obvious scalarization would > > be > > > > x = foo (); > > SR_1 = x.i; > > > > For all other LHS cases (not calls) you can move the V_C_E to the RHS > > and should be fine. > > Well, I have tried to remove the function and have sra_modify_expr > handle this particular case only to discover another one where it's > probably required (given that gimple verification is correct). The > new problem is that I cannot put a V_C_E as an operand of > GIMPLE_RETURN. Specifically I got an "invalid operand in return > statement" failure when verifying: > > return VIEW_CONVERT_EXPR<struct gnat__perfect_hash_generators__key_type>(SR.2108_12); Indeed. GIMPLE_RETURN needs a register argument, so tmp = VIEW_CONVERT_EXPR<struct gnat__perfect_hash_generators__key_type>(SR.2...; return tmp; > Thus I will probably keep the function to be always safe but change it > to the following: > > /* Substitute into *EXPR an expression of type TYPE with the value of the > replacement of ACCESS. This is done either by producing a special V_C_E > assignment statement converting the replacement to a new temporary of the > requested type if TYPE is is_gimple_reg_type or by going through the base > aggregate if it is not. */ > > static void > fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, > gimple_stmt_iterator *gsi, bool write) > { > tree repl = get_access_replacement (access); > > if (is_gimple_reg_type (type)) > { > tree tmp = create_tmp_var (type, "SRvce"); > > add_referenced_var (tmp); > tmp = make_ssa_name (tmp, NULL); > > if (write) > { > gimple stmt; > tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); > > *expr = tmp; > SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); > stmt = gimple_build_assign (repl, conv); > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > update_stmt (stmt); > } > else > { > gimple stmt; > tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); > > stmt = gimple_build_assign (tmp, conv); > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > SSA_NAME_DEF_STMT (tmp) = stmt; > *expr = tmp; > update_stmt (stmt); > } > } > else > { > if (write) > { > gimple stmt; > > stmt = gimple_build_assign (repl, unshare_expr (access->expr)); > gsi_insert_after (gsi, stmt, GSI_NEW_STMT); > update_stmt (stmt); > } > else > { > gimple stmt; > > stmt = gimple_build_assign (unshare_expr (access->expr), repl); > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > update_stmt (stmt); > } > } > } I will have a look at the current version you sent me an play with it a bit. > > I don't understand the TREE_ADDRESSABLE thingy yet. Especially why > > the type should be a non-register type. > > I guess it does not matter now, but in your example above type would be > struct x, not int. > > > > It certainly passes bootstrap and testing, I use --enable-checking=yes. > > > > That's good. > > > > > > > { > > > > > /* I have never seen this code path trigger but if it can happen the > > > > > following should handle it gracefully. */ > > > > > > > > It can trigger for vector constants. > > > > > > OK, I'll remove the comment. Apparently there are none in the > > > testsuite, I believe I tested with a gcc_unreachable here. > > > > Err, for vector constants we have VECTOR_CST, so it triggers for > > non-constant vector constructors like > > > > vector int x = { a, b, c, d }; > > I see. > > > > > > update_stmt (aux_stmt); > > > > > new_stmt = gimple_build_assign (get_access_replacement (access), > > > > > fold_build2 (COMPLEX_EXPR, access->type, > > > > > rp, ip)); > > > > > gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); > > > > > update_stmt (new_stmt); > > > > > > > > Hm. So you do what complex lowering does here. Note that this may > > > > create loads from uninitialized memory with all its problems. > > > > > > Yes, but I have not had any such problems with complex types (as > > > opposed to simple loads from half-initialized records, for example). > > > OTOH, I have also contemplated setting DECL_GIMPLE_REG_P to zero of > > > complex replacement which appear in IMAG_PART or REAL_PART on a LHS of > > > a statement. > > > > Yes, that's necessary. I still think SRA should not bother about > > this at all ;) > > > > > > WRT the complex stuff. If you would do scalarization and analysis > > > > just on the components (not special case REAL/IMAGPART_EXPR everywhere) > > > > it should work better, correct? You still could handle group > > > > scalarization for the case of for example passing a complex argument > > > > to a function. > > > > > > Well, my reasoning was that if complex types were first-class citizens > > > in gimple (as opposed to a record), there was a reason to keep them > > > together and so I attempted that. But again, if that is a > > > misconception of mine and there is no point in keeping them together, > > > I will gladly remove this. > > > > It's not clear. Complex lowering decomposes all complex variables > > to components if possible. Again, simplifying SRA is probably better. > > OK, I will find out how much this would actually simplify things and > what new problems might arise. > > I have already tried relaxing the access tree analysis s that it does > not prevent scalarization of subaccesses of scalar accesses which > would be necessary for decomposing complex components and uncovered an > unrelated problem, again in my favorite testcase entry_4.f90. > > Can you please check whether the following snippet is a valid gimple? > The expander ICEs on an assert when trying to crunch the V_C_E at the > end. Looking at it myself, I start to doubt that I can always handle > union type-punning with V_C_Es. That's a bug in the expander. C testcase that ICEs even w/o SRA: float foo(int i) { int j = i == 42; return *(float *)&j; } I have a fix in testing. Richard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-04-29 12:56 ` Richard Guenther 2009-05-10 10:33 ` Martin Jambor @ 2009-05-10 10:39 ` Martin Jambor 2009-05-12 9:49 ` Martin Jambor 1 sibling, 1 reply; 25+ messages in thread From: Martin Jambor @ 2009-05-10 10:39 UTC (permalink / raw) To: Richard Guenther; +Cc: GCC Patches, Jan Hubicka Hi, this is the new tree-sra.c file which incorporates most of your concerns. It passes bootstrap and all tests (including Ada) on x86_64-linux , revision 147317. I'm sorry it took so long, I was on vacation and then I stumbled upon a few new weird regression (usually results of new oversights). I'm looking forward to your comments. Martin /* Scalar Replacement of Aggregates (SRA) converts some structure references into scalar references, exposing them to the scalar optimizers. Copyright (C) 2008, 2009 Free Software Foundation, Inc. Contributed by Martin Jambor <mjambor@suse.cz> This file is part of GCC. GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ /* This file implements Scalar Reduction of Aggregates (SRA). SRA is run twice, once in the early stages of compilation (early SRA) and once in the late stages (late SRA). The aim of both is to turn references to scalar parts of aggregates into uses of independent scalar variables. The two passes are nearly identical, the only difference is that early SRA does not scalarize unions which are used as the result in a GIMPLE_RETURN statement because together with inlining this can lead to weird type conversions. Both passes operate in four stages: 1. The declarations that have properties which make them candidates for scalarization are identified in function find_var_candidates(). The candidates are stored in candidate_bitmap. 2. The function body is scanned. In the process, declarations which are used in a manner that prevent their scalarization are removed from the candidate bitmap. More importantly, for every access into an aggregate, an access structure (struct access) is created by create_access() and stored in a vector associated with the aggregate. Among other information, the aggregate declaration, the offset and size of the access and its type are stored in the structure. On a related note, assign_link structures are created for every assign statement between candidate aggregates and attached to the related accesses. 3. The vectors of accesses are analyzed. They are first sorted according to their offset and size and then scanned for partially overlapping accesses (i.e. those which overlap but one is not entirely within another). Such an access disqualifies the whole aggregate from being scalarized. If there is no such inhibiting overlap, a representative access structure is chosen for every unique combination of offset and size. Afterwards, the pass builds a set of trees from these structures, in which children of an access are within their parent (in terms of offset and size). Then accesses are propagated whenever possible (i.e. in cases when it does not create a partially overlapping access) across assign_links from the right hand side to the left hand side. Then the set of trees for each declaration is traversed again and those accesses which should be replaced by a scalar are identified. 4. The function is traversed again, and for every reference into an aggregate that has some component which is about to be scalarized, statements are amended and new statements are created as necessary. Finally, if a parameter got scalarized, the scalar replacements are initialized with values from respective parameter aggregates. */ #include "config.h" #include "system.h" #include "coretypes.h" #include "alloc-pool.h" #include "tm.h" #include "tree.h" #include "gimple.h" #include "tree-flow.h" #include "diagnostic.h" #include "tree-dump.h" #include "timevar.h" #include "params.h" #include "target.h" #include "flags.h" /* Enumeration of all aggregate reductions we can do. */ enum sra_mode { SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ SRA_MODE_INTRA }; /* late intraprocedural SRA */ /* Global variable describing which aggregate reduction we are performing at the moment. */ static enum sra_mode sra_mode; struct assign_link; /* ACCESS represents each access to an aggregate variable (as a whole or a part). It can also represent a group of accesses that refer to exactly the same fragment of an aggregate (i.e. those that have exactly the same offset and size). Such representatives for a single aggregate, once determined, are linked in a linked list and have the group fields set. Moreover, when doing intraprocedural SRA, a tree is built from those representatives (by the means of first_child and next_sibling pointers), in which all items in a subtree are "within" the root, i.e. their offset is greater or equal to offset of the root and offset+size is smaller or equal to offset+size of the root. Children of an access are sorted by offset. Note that accesses to parts of vector and complex number types always represented by an access to the whole complex number or a vector. It is a duty of the modifying functions to replace them appropriately. */ struct access { /* Values returned by `get_ref_base_and_extent' for each component reference If EXPR isn't a component reference just set `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ HOST_WIDE_INT offset; HOST_WIDE_INT size; tree base; /* Expression. */ tree expr; /* Type. */ tree type; /* Next group representative for this aggregate. */ struct access *next_grp; /* Pointer to the group representative. Pointer to itself if the struct is the representative. */ struct access *group_representative; /* If this access has any children (in terms of the definition above), this points to the first one. */ struct access *first_child; /* Pointer to the next sibling in the access tree as described above. */ struct access *next_sibling; /* Pointers to the first and last element in the linked list of assign links. */ struct assign_link *first_link, *last_link; /* Pointer to the next access in the work queue. */ struct access *next_queued; /* Replacement variable for this access "region." Never to be accessed directly, always only by the means of get_access_replacement() and only when grp_to_be_replaced flag is set. */ tree replacement_decl; /* Is this particular access write access? */ unsigned write : 1; /* Is this access currently in the work queue? */ unsigned grp_queued : 1; /* Does this group contain a write access? This flag is propagated down the access tree. */ unsigned grp_write : 1; /* Does this group contain a read access? This flag is propagated down the access tree. */ unsigned grp_read : 1; /* Is the subtree rooted in this access fully covered by scalar replacements? */ unsigned grp_covered : 1; /* If set to true, this access and all below it in an access tree must not be scalarized. */ unsigned grp_unscalarizable_region : 1; /* Whether data have been written to parts of the aggregate covered by this access which is not to be scalarized. This flag is propagated up in the access tree. */ unsigned grp_unscalarized_data : 1; /* Does this access and/or group contain a write access through a BIT_FIELD_REF? */ unsigned grp_bfr_lhs : 1; /* Set when a scalar replacement should be created for this variable. We do the decision and creation at different places because create_tmp_var cannot be called from within FOR_EACH_REFERENCED_VAR. */ unsigned grp_to_be_replaced : 1; }; typedef struct access *access_p; DEF_VEC_P (access_p); DEF_VEC_ALLOC_P (access_p, heap); /* Alloc pool for allocating access structures. */ static alloc_pool access_pool; /* A structure linking lhs and rhs accesses from an aggregate assignment. They are used to propagate subaccesses from rhs to lhs as long as they don't conflict with what is already there. */ struct assign_link { struct access *lacc, *racc; struct assign_link *next; }; /* Alloc pool for allocating assign link structures. */ static alloc_pool link_pool; /* Base (tree) -> Vector (VEC(access_p,heap) *) map. */ static struct pointer_map_t *base_access_vec; /* Bitmap of bases (candidates). */ static bitmap candidate_bitmap; /* Bitmap of declarations used in a return statement. */ static bitmap retvals_bitmap; /* Obstack for creation of fancy names. */ static struct obstack name_obstack; /* Head of a linked list of accesses that need to have its subaccesses propagated to their assignment counterparts. */ static struct access *work_queue_head; /* Dump contents of ACCESS to file F in a human friendly way. If GRP is true, representative fields are dumped, otherwise those which only describe the individual access are. */ static void dump_access (FILE *f, struct access *access, bool grp) { fprintf (f, "access { "); fprintf (f, "base = (%d)'", DECL_UID (access->base)); print_generic_expr (f, access->base, 0); fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset); fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size); fprintf (f, ", expr = "); print_generic_expr (f, access->expr, 0); fprintf (f, ", type = "); print_generic_expr (f, access->type, 0); if (grp) fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, " "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " "grp_to_be_replaced = %d\n", access->grp_write, access->grp_read, access->grp_covered, access->grp_unscalarizable_region, access->grp_unscalarized_data, access->grp_to_be_replaced); else fprintf (f, ", write = %d'\n", access->write); } /* Dump a subtree rooted in ACCESS to file F, indent by LEVEL. */ static void dump_access_tree_1 (FILE *f, struct access *access, int level) { do { int i; for (i = 0; i < level; i++) fputs ("* ", dump_file); dump_access (f, access, true); if (access->first_child) dump_access_tree_1 (f, access->first_child, level + 1); access = access->next_sibling; } while (access); } /* Dump all access trees for a variable, given the pointer to the first root in ACCESS. */ static void dump_access_tree (FILE *f, struct access *access) { for (; access; access = access->next_grp) dump_access_tree_1 (f, access, 0); } /* Return a vector of pointers to accesses for the variable given in BASE or NULL if there is none. */ static VEC (access_p, heap) * get_base_access_vector (tree base) { void **slot; slot = pointer_map_contains (base_access_vec, base); if (!slot) return NULL; else return *(VEC (access_p, heap) **) slot; } /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted in ACCESS. Return NULL if it cannot be found. */ static struct access * find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, HOST_WIDE_INT size) { while (access && (access->offset != offset || access->size != size)) { struct access *child = access->first_child; while (child && (child->offset + child->size <= offset)) child = child->next_sibling; access = child; } return access; } /* Return the first group representative for DECL or NULL if none exists. */ static struct access * get_first_repr_for_decl (tree base) { VEC (access_p, heap) *access_vec; access_vec = get_base_access_vector (base); if (!access_vec) return NULL; return VEC_index (access_p, access_vec, 0); } /* Find an access representative for the variable BASE and given OFFSET and SIZE. Requires that access trees have already been built. Return NULL if it cannot be found. */ static struct access * get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size) { struct access *access; access = get_first_repr_for_decl (base); while (access && (access->offset + access->size <= offset)) access = access->next_grp; if (!access) return NULL; return find_access_in_subtree (access, offset, size); } /* Add LINK to the linked list of assign links of RACC. */ static void add_link_to_rhs (struct access *racc, struct assign_link *link) { gcc_assert (link->racc == racc); if (!racc->first_link) { gcc_assert (!racc->last_link); racc->first_link = link; } else racc->last_link->next = link; racc->last_link = link; link->next = NULL; } /* Move all link structures in their linked list in OLD_RACC to the linked list in NEW_RACC. */ static void relink_to_new_repr (struct access *new_racc, struct access *old_racc) { if (!old_racc->first_link) { gcc_assert (!old_racc->last_link); return; } if (new_racc->first_link) { gcc_assert (!new_racc->last_link->next); gcc_assert (!old_racc->last_link || !old_racc->last_link->next); new_racc->last_link->next = old_racc->first_link; new_racc->last_link = old_racc->last_link; } else { gcc_assert (!new_racc->last_link); new_racc->first_link = old_racc->first_link; new_racc->last_link = old_racc->last_link; } old_racc->first_link = old_racc->last_link = NULL; } /* Add ACCESS to the work queue (which is actually a stack). */ static void add_access_to_work_queue (struct access *access) { if (!access->grp_queued) { gcc_assert (!access->next_queued); access->next_queued = work_queue_head; access->grp_queued = 1; work_queue_head = access; } } /* Pop an access from the work queue, and return it, assuming there is one. */ static struct access * pop_access_from_work_queue (void) { struct access *access = work_queue_head; work_queue_head = access->next_queued; access->next_queued = NULL; access->grp_queued = 0; return access; } /* Allocate necessary structures. */ static void sra_initialize (void) { candidate_bitmap = BITMAP_ALLOC (NULL); retvals_bitmap = BITMAP_ALLOC (NULL); gcc_obstack_init (&name_obstack); access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16); link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16); base_access_vec = pointer_map_create (); } /* Hook fed to pointer_map_traverse, deallocate stored vectors. */ static bool delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value, void *data ATTRIBUTE_UNUSED) { VEC (access_p, heap) *access_vec; access_vec = (VEC (access_p, heap) *) *value; VEC_free (access_p, heap, access_vec); return true; } /* Deallocate all general structures. */ static void sra_deinitialize (void) { BITMAP_FREE (candidate_bitmap); BITMAP_FREE (retvals_bitmap); free_alloc_pool (access_pool); free_alloc_pool (link_pool); obstack_free (&name_obstack, NULL); pointer_map_traverse (base_access_vec, delete_base_accesses, NULL); pointer_map_destroy (base_access_vec); } /* Remove DECL from candidates for SRA and write REASON to the dump file if there is one. */ static void disqualify_candidate (tree decl, const char *reason) { bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "! Disqualifying "); print_generic_expr (dump_file, decl, 0); fprintf (dump_file, " - %s\n", reason); } } /* Return true iff the type contains a field or an element which does not allow scalarization. */ static bool type_internals_preclude_sra_p (tree type) { tree fld; tree et; switch (TREE_CODE (type)) { case RECORD_TYPE: case UNION_TYPE: case QUAL_UNION_TYPE: for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) if (TREE_CODE (fld) == FIELD_DECL) { tree ft = TREE_TYPE (fld); if (TREE_THIS_VOLATILE (fld) || !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld) || !host_integerp (DECL_FIELD_OFFSET (fld), 1) || !host_integerp (DECL_SIZE (fld), 1)) return true; if (AGGREGATE_TYPE_P (ft) && type_internals_preclude_sra_p (ft)) return true; } return false; case ARRAY_TYPE: et = TREE_TYPE (type); if (AGGREGATE_TYPE_P (et)) return type_internals_preclude_sra_p (et); else return false; default: return false; } } /* Create and insert access for EXPR. Return created access, or NULL if it is not possible. */ static struct access * create_access (tree expr, bool write) { struct access *access; void **slot; VEC (access_p,heap) *vec; HOST_WIDE_INT offset, size, max_size; tree base = expr; bool unscalarizable_region = false; base = get_ref_base_and_extent (expr, &offset, &size, &max_size); /* !!! Assert for testing only, remove after some time. */ gcc_assert (base); if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; if (size != max_size) { size = max_size; unscalarizable_region = true; } if (size < 0) { disqualify_candidate (base, "Encountered an unconstrained access."); return NULL; } access = (struct access *) pool_alloc (access_pool); memset (access, 0, sizeof (struct access)); access->base = base; access->offset = offset; access->size = size; access->expr = expr; access->type = TREE_TYPE (expr); access->write = write; access->grp_unscalarizable_region = unscalarizable_region; slot = pointer_map_contains (base_access_vec, base); if (slot) vec = (VEC (access_p, heap) *) *slot; else vec = VEC_alloc (access_p, heap, 32); VEC_safe_push (access_p, heap, vec, access); *((struct VEC (access_p,heap) **) pointer_map_insert (base_access_vec, base)) = vec; return access; } /* Search the given tree for a declaration by skipping handled components and exclude it from the candidates. */ static void disqualify_base_of_expr (tree t, const char *reason) { while (handled_component_p (t)) t = TREE_OPERAND (t, 0); if (DECL_P (t)) disqualify_candidate (t, reason); } /* Scan expression EXPR and create access structures for all accesses to candidates for scalarization. Return the created access or NULL if none is created. */ static struct access * build_access_from_expr_1 (tree *expr_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write) { struct access *ret = NULL; tree expr = *expr_ptr; bool bit_ref; if (TREE_CODE (expr) == BIT_FIELD_REF) { expr = TREE_OPERAND (expr, 0); bit_ref = true; } else bit_ref = false; if (CONVERT_EXPR_P (expr) || TREE_CODE (expr) == VIEW_CONVERT_EXPR) expr = TREE_OPERAND (expr, 0); if (contains_view_convert_expr_p (expr)) { disqualify_base_of_expr (expr, "V_C_E under a different handled " "component."); return NULL; } switch (TREE_CODE (expr)) { case VAR_DECL: case PARM_DECL: case RESULT_DECL: case COMPONENT_REF: case ARRAY_REF: case ARRAY_RANGE_REF: ret = create_access (expr, write); break; case REALPART_EXPR: case IMAGPART_EXPR: expr = TREE_OPERAND (expr, 0); ret = create_access (expr, write); break; default: break; } if (write && bit_ref && ret) ret->grp_bfr_lhs = 1; return ret; } /* Scan expression EXPR and create access structures for all accesses to candidates for scalarization. Return true if any access has been inserted. */ static bool build_access_from_expr (tree *expr_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write, void *data ATTRIBUTE_UNUSED) { return build_access_from_expr_1 (expr_ptr, gsi, write) != NULL; } /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in modes in which it matters, return true iff they have been disqualified. RHS may be NULL, in that case ignore it. If we scalarize an aggregate in intra-SRA we may need to add statements after each statement. This is not possible if a statement unconditionally has to end the basic block. */ static bool disqualify_ops_if_throwing_stmt (gimple stmt, tree lhs, tree rhs) { if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)) { disqualify_base_of_expr (lhs, "LHS of a throwing stmt."); if (rhs) disqualify_base_of_expr (rhs, "RHS of a throwing stmt."); return true; } return false; } /* Result code for scan_assign callback for scan_function. */ enum scan_assign_result { SRA_SA_NONE, /* nothing done for the stmt */ SRA_SA_PROCESSED, /* stmt analyzed/changed */ SRA_SA_REMOVED }; /* stmt redundant and eliminated */ /* Scan expressions occuring in the statement pointed to by STMT_EXPR, create access structures for all accesses to candidates for scalarization and remove those candidates which occur in statements or expressions that prevent them from being split apart. Return true if any access has been inserted. */ static enum scan_assign_result build_accesses_from_assign (gimple *stmt_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { gimple stmt = *stmt_ptr; tree *lhs_ptr, *rhs_ptr; struct access *lacc, *racc; if (!gimple_assign_single_p (stmt)) return SRA_SA_NONE; lhs_ptr = gimple_assign_lhs_ptr (stmt); rhs_ptr = gimple_assign_rhs1_ptr (stmt); if (disqualify_ops_if_throwing_stmt (stmt, *lhs_ptr, *rhs_ptr)) return SRA_SA_NONE; racc = build_access_from_expr_1 (rhs_ptr, gsi, false); lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); if (lacc && racc && !lacc->grp_unscalarizable_region && !racc->grp_unscalarizable_region && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) /* FIXME: Turn the following line into an assert after PR 40058 is fixed. */ && lacc->size == racc->size && useless_type_conversion_p (lacc->type, racc->type)) { struct assign_link *link; link = (struct assign_link *) pool_alloc (link_pool); memset (link, 0, sizeof (struct assign_link)); link->lacc = lacc; link->racc = racc; add_link_to_rhs (racc, link); } return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE; } /* Callback of walk_stmt_load_store_addr_ops visit_addr used to determine GIMPLE_ASM operands with memory constrains which cannot be scalarized. */ static bool asm_visit_addr (gimple stmt ATTRIBUTE_UNUSED, tree op, void *data ATTRIBUTE_UNUSED) { if (DECL_P (op)) disqualify_candidate (op, "Non-scalarizable GIMPLE_ASM operand."); return false; } /* Scan function and look for interesting statements. Return true if any has been found or processed, as indicated by callbacks. SCAN_EXPR is a callback called on all expressions within statements except assign statements and those deemed entirely unsuitable for some reason (all operands in such statements and expression are removed from candidate_bitmap). SCAN_ASSIGN is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback called on assign statements and those call statements which have a lhs and it is the only callback which can be NULL. ANALYSIS_STAGE is true when running in the analysis stage of a pass and thus no statement is being modified. DATA is a pointer passed to all callbacks. If any single callback returns true, this function also returns true, otherwise it returns false. */ static bool scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *), enum scan_assign_result (*scan_assign) (gimple *, gimple_stmt_iterator *, void *), bool (*handle_ssa_defs)(gimple, void *), bool analysis_stage, void *data) { gimple_stmt_iterator gsi; basic_block bb; unsigned i; tree *t; bool ret = false; FOR_EACH_BB (bb) { bool bb_changed = false; gsi = gsi_start_bb (bb); while (!gsi_end_p (gsi)) { gimple stmt = gsi_stmt (gsi); enum scan_assign_result assign_result; bool any = false, deleted = false; switch (gimple_code (stmt)) { case GIMPLE_RETURN: t = gimple_return_retval_ptr (stmt); if (*t != NULL_TREE) { if (DECL_P (*t)) { tree ret_type = TREE_TYPE (*t); if (analysis_stage && sra_mode == SRA_MODE_EARLY_INTRA && (TREE_CODE (ret_type) == UNION_TYPE || TREE_CODE (ret_type) == QUAL_UNION_TYPE)) disqualify_candidate (*t, "Union in a return statement."); else bitmap_set_bit (retvals_bitmap, DECL_UID (*t)); } any |= scan_expr (t, &gsi, false, data); } break; case GIMPLE_ASSIGN: assign_result = scan_assign (&stmt, &gsi, data); any |= assign_result == SRA_SA_PROCESSED; deleted = assign_result == SRA_SA_REMOVED; if (handle_ssa_defs && assign_result != SRA_SA_REMOVED) any |= handle_ssa_defs (stmt, data); break; case GIMPLE_CALL: /* Operands must be processed before the lhs. */ for (i = 0; i < gimple_call_num_args (stmt); i++) { tree *argp = gimple_call_arg_ptr (stmt, i); any |= scan_expr (argp, &gsi, false, data); } if (gimple_call_lhs (stmt)) { tree *lhs_ptr = gimple_call_lhs_ptr (stmt); if (!analysis_stage || !disqualify_ops_if_throwing_stmt (stmt, *lhs_ptr, NULL)) { any |= scan_expr (lhs_ptr, &gsi, true, data); if (handle_ssa_defs) any |= handle_ssa_defs (stmt, data); } } break; case GIMPLE_ASM: if (analysis_stage) walk_stmt_load_store_addr_ops (stmt, NULL, NULL, NULL, asm_visit_addr); for (i = 0; i < gimple_asm_ninputs (stmt); i++) { tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i)); any |= scan_expr (op, &gsi, false, data); } for (i = 0; i < gimple_asm_noutputs (stmt); i++) { tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i)); any |= scan_expr (op, &gsi, true, data); } default: break; } if (any) { ret = true; bb_changed = true; if (!analysis_stage) { update_stmt (stmt); if (!stmt_could_throw_p (stmt)) remove_stmt_from_eh_region (stmt); } } if (deleted) bb_changed = true; else { gsi_next (&gsi); ret = true; } } if (!analysis_stage && bb_changed) gimple_purge_dead_eh_edges (bb); } return ret; } /* Helper of QSORT function. There are pointers to accesses in the array. An access is considered smaller than another if it has smaller offset or if the offsets are the same but is size is bigger. */ static int compare_access_positions (const void *a, const void *b) { const access_p *fp1 = (const access_p *) a; const access_p *fp2 = (const access_p *) b; const access_p f1 = *fp1; const access_p f2 = *fp2; if (f1->offset != f2->offset) return f1->offset < f2->offset ? -1 : 1; if (f1->size == f2->size) return 0; /* We want the bigger accesses first, thus the opposite operator in the next line: */ return f1->size > f2->size ? -1 : 1; } /* Append a name of the declaration to the name obstack. A helper function for make_fancy_name. */ static void make_fancy_decl_name (tree decl) { char buffer[32]; tree name = DECL_NAME (decl); if (name) obstack_grow (&name_obstack, IDENTIFIER_POINTER (name), IDENTIFIER_LENGTH (name)); else { sprintf (buffer, "D%u", DECL_UID (decl)); obstack_grow (&name_obstack, buffer, strlen (buffer)); } } /* Helper for make_fancy_name. */ static void make_fancy_name_1 (tree expr) { char buffer[32]; tree index; if (DECL_P (expr)) { make_fancy_decl_name (expr); return; } switch (TREE_CODE (expr)) { case COMPONENT_REF: make_fancy_name_1 (TREE_OPERAND (expr, 0)); obstack_1grow (&name_obstack, '$'); make_fancy_decl_name (TREE_OPERAND (expr, 1)); break; case ARRAY_REF: make_fancy_name_1 (TREE_OPERAND (expr, 0)); obstack_1grow (&name_obstack, '$'); /* Arrays with only one element may not have a constant as their index. */ index = TREE_OPERAND (expr, 1); if (TREE_CODE (index) != INTEGER_CST) break; sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index)); obstack_grow (&name_obstack, buffer, strlen (buffer)); break; case BIT_FIELD_REF: case REALPART_EXPR: case IMAGPART_EXPR: gcc_unreachable (); /* we treat these as scalars. */ break; default: break; } } /* Create a human readable name for replacement variable of ACCESS. */ static char * make_fancy_name (tree expr) { make_fancy_name_1 (expr); obstack_1grow (&name_obstack, '\0'); return XOBFINISH (&name_obstack, char *); } /* Helper function for build_ref_for_offset. */ static bool build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset, tree exp_type) { while (1) { tree fld; tree tr_size, index; HOST_WIDE_INT el_size; if (offset == 0 && exp_type && useless_type_conversion_p (exp_type, type)) return true; switch (TREE_CODE (type)) { case UNION_TYPE: case QUAL_UNION_TYPE: case RECORD_TYPE: /* Some ADA records are half-unions, treat all of them the same. */ for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) { HOST_WIDE_INT pos, size; tree expr, *expr_ptr; if (TREE_CODE (fld) != FIELD_DECL) continue; pos = int_bit_position (fld); gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0); size = tree_low_cst (DECL_SIZE (fld), 1); if (pos > offset || (pos + size) <= offset) continue; if (res) { expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld, NULL_TREE); expr_ptr = &expr; } else expr_ptr = NULL; if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld), offset - pos, exp_type)) { if (res) *res = expr; return true; } } return false; case ARRAY_TYPE: tr_size = TYPE_SIZE (TREE_TYPE (type)); if (!tr_size || !host_integerp (tr_size, 1)) return false; el_size = tree_low_cst (tr_size, 1); index = build_int_cst (TYPE_DOMAIN (type), offset / el_size); if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) index = int_const_binop (PLUS_EXPR, index, TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0); if (res) *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE, NULL_TREE); offset = offset % el_size; type = TREE_TYPE (type); break; default: if (offset != 0) return false; if (exp_type) return false; else return true; } } } /* Construct an expression that would reference a part of aggregate *EXPR of type TYPE at the given OFFSET of the type EXP_TYPE. If EXPR is NULL, the function only determines whether it can build such a reference without actually doing it. FIXME: Eventually this should be replaced with maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a minor rewrite of fold_stmt. */ static bool build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset, tree exp_type, bool allow_ptr) { if (allow_ptr && POINTER_TYPE_P (type)) { type = TREE_TYPE (type); if (expr) *expr = fold_build1 (INDIRECT_REF, type, *expr); } return build_ref_for_offset_1 (expr, type, offset, exp_type); } /* The very first phase of intraprocedural SRA. It marks in candidate_bitmap those with type which is suitable for scalarization. */ static bool find_var_candidates (void) { tree var, type; referenced_var_iterator rvi; bool ret = false; FOR_EACH_REFERENCED_VAR (var, rvi) { if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL) continue; type = TREE_TYPE (var); if (!AGGREGATE_TYPE_P (type) || needs_to_live_in_memory (var) || TREE_THIS_VOLATILE (var) || !COMPLETE_TYPE_P (type) || !host_integerp (TYPE_SIZE (type), 1) || tree_low_cst (TYPE_SIZE (type), 1) == 0 || type_internals_preclude_sra_p (type)) continue; bitmap_set_bit (candidate_bitmap, DECL_UID (var)); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Candidate (%d): ", DECL_UID (var)); print_generic_expr (dump_file, var, 0); fprintf (dump_file, "\n"); } ret = true; } return ret; } /* Sort all accesses for the given variable, check for partial overlaps and return NULL if there are any. If there are none, pick a representative for each combination of offset and size and create a linked list out of them. Return the pointer to the first representative and make sure it is the first one in the vector of accesses. */ static struct access * sort_and_splice_var_accesses (tree var) { int i, j, access_count; struct access *res, **prev_acc_ptr = &res; VEC (access_p, heap) *access_vec; bool first = true; HOST_WIDE_INT low = -1, high = 0; access_vec = get_base_access_vector (var); if (!access_vec) return NULL; access_count = VEC_length (access_p, access_vec); /* Sort by <OFFSET, SIZE>. */ qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p), compare_access_positions); i = 0; while (i < access_count) { struct access *access = VEC_index (access_p, access_vec, i); bool modification = access->write; bool grp_read = !access->write; bool grp_bfr_lhs = access->grp_bfr_lhs; bool first_scalar = is_gimple_reg_type (access->type); bool unscalarizable_region = access->grp_unscalarizable_region; if (first || access->offset >= high) { first = false; low = access->offset; high = access->offset + access->size; } else if (access->offset > low && access->offset + access->size > high) return NULL; else gcc_assert (access->offset >= low && access->offset + access->size <= high); j = i + 1; while (j < access_count) { struct access *ac2 = VEC_index (access_p, access_vec, j); if (ac2->offset != access->offset || ac2->size != access->size) break; modification |= ac2->write; grp_read |= !ac2->write; grp_bfr_lhs |= ac2->grp_bfr_lhs; unscalarizable_region |= ac2->grp_unscalarizable_region; relink_to_new_repr (access, ac2); /* If one of the equivalent accesses is scalar, use it as a representative (this happens when when there is for example on a single scalar field in a structure). */ if (!first_scalar && is_gimple_reg_type (ac2->type)) { struct access tmp_acc; first_scalar = true; memcpy (&tmp_acc, ac2, sizeof (struct access)); memcpy (ac2, access, sizeof (struct access)); memcpy (access, &tmp_acc, sizeof (struct access)); } ac2->group_representative = access; j++; } i = j; access->group_representative = access; access->grp_write = modification; access->grp_read = grp_read; access->grp_bfr_lhs = grp_bfr_lhs; access->grp_unscalarizable_region = unscalarizable_region; if (access->first_link) add_access_to_work_queue (access); *prev_acc_ptr = access; prev_acc_ptr = &access->next_grp; } gcc_assert (res == VEC_index (access_p, access_vec, 0)); return res; } /* Create a variable for the given ACCESS which determines the type, name and a few other properties. Return the variable declaration and store it also to ACCESS->replacement. */ static tree create_access_replacement (struct access *access) { tree repl; repl = make_rename_temp (access->type, "SR"); get_var_ann (repl); add_referenced_var (repl); DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); DECL_ARTIFICIAL (repl) = 1; if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) { char *pretty_name = make_fancy_name (access->expr); DECL_NAME (repl) = get_identifier (pretty_name); obstack_free (&name_obstack, pretty_name); SET_DECL_DEBUG_EXPR (repl, access->expr); DECL_DEBUG_EXPR_IS_FROM (repl) = 1; DECL_IGNORED_P (repl) = 0; } DECL_IGNORED_P (repl) = DECL_IGNORED_P (access->base); TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base); if (access->grp_bfr_lhs) DECL_GIMPLE_REG_P (repl) = 0; if (dump_file) { fprintf (dump_file, "Created a replacement for "); print_generic_expr (dump_file, access->base, 0); fprintf (dump_file, " offset: %u, size: %u: ", (unsigned) access->offset, (unsigned) access->size); print_generic_expr (dump_file, repl, 0); fprintf (dump_file, "\n"); } return repl; } /* Return ACCESS scalar replacement, create it if it does not exist yet. */ static inline tree get_access_replacement (struct access *access) { gcc_assert (access->grp_to_be_replaced); if (access->replacement_decl) return access->replacement_decl; access->replacement_decl = create_access_replacement (access); return access->replacement_decl; } /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the linked list along the way. Stop when *ACCESS is NULL or the access pointed to it is not "within" the root. */ static void build_access_subtree (struct access **access) { struct access *root = *access, *last_child = NULL; HOST_WIDE_INT limit = root->offset + root->size; *access = (*access)->next_grp; while (*access && (*access)->offset + (*access)->size <= limit) { if (!last_child) root->first_child = *access; else last_child->next_sibling = *access; last_child = *access; build_access_subtree (access); } } /* Build a tree of access representatives, ACCESS is the pointer to the first one, others are linked in a list by the next_grp field. Decide about scalar replacements on the way, return true iff any are to be created. */ static void build_access_trees (struct access *access) { while (access) { struct access *root = access; build_access_subtree (&access); root->next_grp = access; } } /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when both seeming beneficial and when ALLOW_REPLACEMENTS allows it. Also set all sorts of access flags appropriately along the way, notably always ser grp_read when MARK_READ is true and grp_write when MARK_WRITE is true. */ static bool analyze_access_subtree (struct access *root, bool allow_replacements, bool mark_read, bool mark_write) { struct access *child; HOST_WIDE_INT limit = root->offset + root->size; HOST_WIDE_INT covered_to = root->offset; bool scalar = is_gimple_reg_type (root->type); bool hole = false, sth_created = false; if (mark_read) root->grp_read = true; else if (root->grp_read) mark_read = true; if (mark_write) root->grp_write = true; else if (root->grp_write) mark_write = true; if (root->grp_unscalarizable_region) allow_replacements = false; for (child = root->first_child; child; child = child->next_sibling) { if (!hole && child->offset < covered_to) hole = true; else covered_to += child->size; sth_created |= analyze_access_subtree (child, allow_replacements && !scalar, mark_read, mark_write); root->grp_unscalarized_data |= child->grp_unscalarized_data; hole |= !child->grp_covered; } if (allow_replacements && scalar && !root->first_child) { if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Marking "); print_generic_expr (dump_file, root->base, 0); fprintf (dump_file, " offset: %u, size: %u: ", (unsigned) root->offset, (unsigned) root->size); fprintf (dump_file, " to be replaced.\n"); } root->grp_to_be_replaced = 1; sth_created = true; hole = false; } else if (covered_to < limit) hole = true; if (sth_created && !hole) { root->grp_covered = 1; return true; } if (root->grp_write || TREE_CODE (root->base) == PARM_DECL) root->grp_unscalarized_data = 1; /* not covered and written to */ if (sth_created) return true; return false; } /* Analyze all access trees linked by next_grp by the means of analyze_access_subtree. */ static bool analyze_access_trees (struct access *access) { bool ret = false; while (access) { if (analyze_access_subtree (access, true, false, false)) ret = true; access = access->next_grp; } return ret; } /* Return true iff a potential new child of LACC at offset OFFSET and with size SIZE would conflict with an already existing one. If exactly such a child already exists in LACC, store a pointer to it in EXACT_MATCH. */ static bool child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, HOST_WIDE_INT size, struct access **exact_match) { struct access *child; for (child = lacc->first_child; child; child = child->next_sibling) { if (child->offset == norm_offset && child->size == size) { *exact_match = child; return true; } if (child->offset < norm_offset + size && child->offset + child->size > norm_offset) return true; } return false; } /* Set the expr of TARGET to one just like MODEL but with is own base at the bottom of the handled components. */ static void duplicate_expr_for_different_base (struct access *target, struct access *model) { tree t, expr = unshare_expr (model->expr); gcc_assert (handled_component_p (expr)); t = expr; while (handled_component_p (TREE_OPERAND (t, 0))) t = TREE_OPERAND (t, 0); gcc_assert (TREE_OPERAND (t, 0) == model->base); TREE_OPERAND (t, 0) = target->base; target->expr = expr; } /* Create a new child access of PARENT, with all properties just like MODEL except for its offset and with its grp_write false and grp_read true. Return the new access. Note that this access is created long after all splicing and sorting, it's not located in any access vector and is automatically a representative of its group. */ static struct access * create_artificial_child_access (struct access *parent, struct access *model, HOST_WIDE_INT new_offset) { struct access *access; struct access **child; gcc_assert (!model->grp_unscalarizable_region); access = (struct access *) pool_alloc (access_pool); memset (access, 0, sizeof (struct access)); access->base = parent->base; access->offset = new_offset; access->size = model->size; duplicate_expr_for_different_base (access, model); access->type = model->type; access->grp_write = true; access->grp_read = false; child = &parent->first_child; while (*child && (*child)->offset < new_offset) child = &(*child)->next_sibling; access->next_sibling = *child; *child = access; return access; } /* Propagate all subaccesses of RACC across an assignment link to LACC. Return true if any new subaccess was created. Additionally, if RACC is a scalar access but LACC is not, change the type of the latter. */ static bool propagate_subacesses_accross_link (struct access *lacc, struct access *racc) { struct access *rchild; HOST_WIDE_INT norm_delta = lacc->offset - racc->offset; bool ret = false; if (is_gimple_reg_type (lacc->type) || lacc->grp_unscalarizable_region || racc->grp_unscalarizable_region) return false; if (!lacc->first_child && !racc->first_child && is_gimple_reg_type (racc->type) && (sra_mode == SRA_MODE_INTRA || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base)))) { duplicate_expr_for_different_base (lacc, racc); lacc->type = racc->type; return false; } for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling) { struct access *new_acc = NULL; HOST_WIDE_INT norm_offset = rchild->offset + norm_delta; if (rchild->grp_unscalarizable_region) continue; if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size, &new_acc)) { if (new_acc && rchild->first_child) ret |= propagate_subacesses_accross_link (new_acc, rchild); continue; } new_acc = create_artificial_child_access (lacc, rchild, norm_offset); if (racc->first_child) propagate_subacesses_accross_link (new_acc, rchild); ret = true; } return ret; } /* Propagate all subaccesses across assignment links. */ static void propagate_all_subaccesses (void) { while (work_queue_head) { struct access *racc = pop_access_from_work_queue (); struct assign_link *link; gcc_assert (racc->first_link); for (link = racc->first_link; link; link = link->next) { struct access *lacc = link->lacc; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base))) continue; lacc = lacc->group_representative; if (propagate_subacesses_accross_link (lacc, racc) && lacc->first_link) add_access_to_work_queue (lacc); } } } /* Go through all accesses collected throughout the (intraprocedural) analysis stage, exclude overlapping ones, identify representatives and build trees out of them, making decisions about scalarization on the way. Return true iff there are any to-be-scalarized variables after this stage. */ static bool analyze_all_variable_accesses (void) { tree var; referenced_var_iterator rvi; bool res = false; FOR_EACH_REFERENCED_VAR (var, rvi) if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) { struct access *access; access = sort_and_splice_var_accesses (var); if (access) build_access_trees (access); else disqualify_candidate (var, "No or inhibitingly overlapping accesses."); } propagate_all_subaccesses (); FOR_EACH_REFERENCED_VAR (var, rvi) if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) { struct access *access = get_first_repr_for_decl (var); if (analyze_access_trees (access)) { res = true; if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "\nAccess trees for "); print_generic_expr (dump_file, var, 0); fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); dump_access_tree (dump_file, access); fprintf (dump_file, "\n"); } } else disqualify_candidate (var, "No scalar replacements to be created."); } return res; } /* Return true iff a reference statement into aggregate AGG can be built for every single to-be-replaced accesses that is a child of ACCESS, its sibling or a child of its sibling. TOP_OFFSET is the offset from the processed access subtree that has to be subtracted from offset of each access. */ static bool ref_expr_for_all_replacements_p (struct access *access, tree agg, HOST_WIDE_INT top_offset) { do { if (access->grp_to_be_replaced && !build_ref_for_offset (NULL, TREE_TYPE (agg), access->offset - top_offset, access->type, false)) return false; if (access->first_child && !ref_expr_for_all_replacements_p (access->first_child, agg, top_offset)) return false; access = access->next_sibling; } while (access); return true; } /* Generate statements copying scalar replacements of accesses within a subtree into or out of AGG. ACCESS is the first child of the root of the subtree to be processed. AGG is an aggregate type expression (can be a declaration but does not have to be, it can for example also be an indirect_ref). TOP_OFFSET is the offset of the processed subtree which has to be subtracted from offsets of individual accesses to get corresponding offsets for AGG. If CHUNK_SIZE is non-null, copy only replacements in the interval <start_offset, start_offset + chunk_size>, otherwise copy all. GSI is a statement iterator used to place the new statements. WRITE should be true when the statements should write from AGG to the replacement and false if vice versa. if INSERT_AFTER is true, new statements will be added after the current statement in GSI, they will be added before the statement otherwise. */ static void generate_subtree_copies (struct access *access, tree agg, HOST_WIDE_INT top_offset, HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size, gimple_stmt_iterator *gsi, bool write, bool insert_after) { do { tree expr = unshare_expr (agg); if (chunk_size && access->offset >= start_offset + chunk_size) return; if (access->grp_to_be_replaced && (chunk_size == 0 || access->offset + access->size > start_offset)) { bool repl_found; gimple stmt; repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg), access->offset - top_offset, access->type, false); gcc_assert (repl_found); if (write) stmt = gimple_build_assign (get_access_replacement (access), expr); else { tree repl = get_access_replacement (access); TREE_NO_WARNING (repl) = 1; stmt = gimple_build_assign (expr, repl); } if (insert_after) gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } if (access->first_child) generate_subtree_copies (access->first_child, agg, top_offset, start_offset, chunk_size, gsi, write, insert_after); access = access->next_sibling; } while (access); } /* Assign zero to all scalar replacements in an access subtree. ACCESS is the the root of the subtree to be processed. GSI is the statement iterator used for inserting statements which are added after the current statement if INSERT_AFTER is true or before it otherwise. */ static void init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi, bool insert_after) { struct access *child; if (access->grp_to_be_replaced) { gimple stmt; stmt = gimple_build_assign (get_access_replacement (access), fold_convert (access->type, integer_zero_node)); if (insert_after) gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } for (child = access->first_child; child; child = child->next_sibling) init_subtree_with_zero (child, gsi, insert_after); } /* Search for an access representative for the given expression EXPR and return it or NULL if it cannot be found. */ static struct access * get_access_for_expr (tree expr) { HOST_WIDE_INT offset, size, max_size; tree base; if (TREE_CODE (expr) == VIEW_CONVERT_EXPR) expr = TREE_OPERAND (expr, 0); base = get_ref_base_and_extent (expr, &offset, &size, &max_size); /* !!! Assert for testing only, remove after some time. */ gcc_assert (base); if (max_size == -1 || !DECL_P (base)) return NULL; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; return get_var_base_offset_size_access (base, offset, max_size); } /* Substitute into *EXPR an expression of type TYPE with the value of the replacement of ACCESS. This is done either by producing a special V_C_E assignment statement converting the replacement to a new temporary of the requested type if TYPE is not TREE_ADDRESSABLE or by going through the base aggregate if it is. */ static void fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, gimple_stmt_iterator *gsi, bool write) { tree repl = get_access_replacement (access); if (!TREE_ADDRESSABLE (type)) { tree tmp = create_tmp_var (type, "SRvce"); add_referenced_var (tmp); if (is_gimple_reg_type (type)) tmp = make_ssa_name (tmp, NULL); if (write) { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); *expr = tmp; if (is_gimple_reg_type (type)) SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); stmt = gimple_build_assign (repl, conv); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); stmt = gimple_build_assign (tmp, conv); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); if (is_gimple_reg_type (type)) SSA_NAME_DEF_STMT (tmp) = stmt; *expr = tmp; update_stmt (stmt); } } else { if (write) { gimple stmt; stmt = gimple_build_assign (repl, unshare_expr (access->expr)); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; stmt = gimple_build_assign (unshare_expr (access->expr), repl); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } } } /* Callback for scan_function. Replace the expression EXPR with a scalar replacement if there is one and generate other statements to do type conversion or subtree copying if necessary. GSI is used to place newly created statements, WRITE is true if the expression is being written to (it is on a LHS of a statement or output in an assembly statement). */ static bool sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write, void *data ATTRIBUTE_UNUSED) { struct access *access; tree type, bfr; if (TREE_CODE (*expr) == BIT_FIELD_REF) { bfr = *expr; expr = &TREE_OPERAND (*expr, 0); } else bfr = NULL_TREE; if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR) expr = &TREE_OPERAND (*expr, 0); type = TREE_TYPE (*expr); access = get_access_for_expr (*expr); if (!access) return false; if (access->grp_to_be_replaced) { if (!useless_type_conversion_p (type, access->type)) fix_incompatible_types_for_expr (expr, type, access, gsi, write); else *expr = get_access_replacement (access); } if (access->first_child) { HOST_WIDE_INT start_offset, chunk_size; if (bfr && host_integerp (TREE_OPERAND (bfr, 1), 1) && host_integerp (TREE_OPERAND (bfr, 2), 1)) { start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1); chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1); } else start_offset = chunk_size = 0; generate_subtree_copies (access->first_child, access->base, 0, start_offset, chunk_size, gsi, write, write); } return true; } /* Store all replacements in the access tree rooted in TOP_RACC either to their base aggregate if there are unscalarized data or directly to LHS otherwise. */ static void handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs, gimple_stmt_iterator *gsi) { if (top_racc->grp_unscalarized_data) generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0, gsi, false, false); else generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset, 0, 0, gsi, false, false); } /* Try to generate statements to load all sub-replacements in an access (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC (sub)tree. If that is not possible, refresh the TOP_RACC base aggregate and load the accesses from it. LEFT_OFFSET is the offset of the left whole subtree being copied, RIGHT_OFFSET is the same thing for the right subtree. GSI is stmt iterator used for statement insertions. *REFRESHED is true iff the rhs top aggregate has already been refreshed by contents of its scalar reductions and is set to true if this function has to do it. */ static void load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc, HOST_WIDE_INT left_offset, HOST_WIDE_INT right_offset, gimple_stmt_iterator *old_gsi, gimple_stmt_iterator *new_gsi, bool *refreshed, tree lhs) { do { if (lacc->grp_to_be_replaced) { struct access *racc; HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset; racc = find_access_in_subtree (top_racc, offset, lacc->size); if (racc && racc->grp_to_be_replaced) { gimple stmt; if (useless_type_conversion_p (lacc->type, racc->type)) stmt = gimple_build_assign (get_access_replacement (lacc), get_access_replacement (racc)); else { tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type, get_access_replacement (racc)); stmt = gimple_build_assign (get_access_replacement (lacc), rhs); } gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { tree expr = unshare_expr (top_racc->base); bool repl_found; gimple stmt; /* No suitable access on the right hand side, need to load from the aggregate. See if we have to update it first... */ if (!*refreshed) { gcc_assert (top_racc->first_child); handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); *refreshed = true; } repl_found = build_ref_for_offset (&expr, TREE_TYPE (top_racc->base), lacc->offset - left_offset, lacc->type, false); gcc_assert (repl_found); stmt = gimple_build_assign (get_access_replacement (lacc), expr); gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } } else if (lacc->grp_read && !lacc->grp_covered && !*refreshed) { handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); *refreshed = true; } if (lacc->first_child) load_assign_lhs_subreplacements (lacc->first_child, top_racc, left_offset, right_offset, old_gsi, new_gsi, refreshed, lhs); lacc = lacc->next_sibling; } while (lacc); } /* Return true iff ACC is non-NULL and has subaccesses. */ static inline bool access_has_children_p (struct access *acc) { return acc && acc->first_child; } /* Modify assignments with a CONSTRUCTOR on their RHS. STMT contains a pointer to the assignment and GSI is the statement iterator pointing at it. Returns the same values as sra_modify_assign. */ static enum scan_assign_result sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi) { tree lhs = gimple_assign_lhs (*stmt); struct access *acc; gcc_assert (TREE_CODE (lhs) != REALPART_EXPR && TREE_CODE (lhs) != IMAGPART_EXPR); acc = get_access_for_expr (lhs); if (!acc) return SRA_SA_NONE; if (VEC_length (constructor_elt, CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0) { /* I have never seen this code path trigger but if it can happen the following should handle it gracefully. */ if (access_has_children_p (acc)) generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi, true, true); return SRA_SA_PROCESSED; } if (acc->grp_covered) { init_subtree_with_zero (acc, gsi, false); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); return SRA_SA_REMOVED; } else { init_subtree_with_zero (acc, gsi, true); return SRA_SA_PROCESSED; } } /* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with to-be-scalarized expressions with them. STMT is the statement and GSI is the iterator used to place new helper statements. Returns the same values as sra_modify_assign. */ static enum scan_assign_result sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi) { tree lhs, complex, ptype, rp, ip; struct access *access; gimple new_stmt, aux_stmt; lhs = gimple_assign_lhs (stmt); complex = TREE_OPERAND (lhs, 0); access = get_access_for_expr (complex); if (!access || !access->grp_to_be_replaced) return SRA_SA_NONE; ptype = TREE_TYPE (TREE_TYPE (complex)); rp = create_tmp_var (ptype, "SRr"); add_referenced_var (rp); rp = make_ssa_name (rp, NULL); ip = create_tmp_var (ptype, "SRp"); add_referenced_var (ip); ip = make_ssa_name (ip, NULL); if (TREE_CODE (lhs) == IMAGPART_EXPR) { aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype, get_access_replacement (access))); SSA_NAME_DEF_STMT (rp) = aux_stmt; gimple_assign_set_lhs (stmt, ip); SSA_NAME_DEF_STMT (ip) = stmt; } else { aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype, get_access_replacement (access))); SSA_NAME_DEF_STMT (ip) = aux_stmt; gimple_assign_set_lhs (stmt, rp); SSA_NAME_DEF_STMT (rp) = stmt; } gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT); update_stmt (aux_stmt); new_stmt = gimple_build_assign (get_access_replacement (access), fold_build2 (COMPLEX_EXPR, access->type, rp, ip)); gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); update_stmt (new_stmt); return SRA_SA_PROCESSED; } /* Change STMT to assign compatible types by means of adding component or array references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as variable with the same names in sra_modify_assign. If we can avoid a V_C_E used to load a field from a (single field) record or a union or an element of a one-sized array by producing COMPONENT_REFs and ARRAY_REFs instead, do so. */ static void fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt, struct access *lacc, struct access *racc, tree lhs, tree *rhs, tree ltype, tree rtype) { if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) && !access_has_children_p (lacc)) { tree expr = unshare_expr (lhs); bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, false); if (found) { gimple_assign_set_lhs (*stmt, expr); return; } } if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) && !access_has_children_p (racc)) { tree expr = unshare_expr (*rhs); bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, false); if (found) { gimple_assign_set_rhs1 (*stmt, expr); return; } } *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); gimple_assign_set_rhs_from_tree (gsi, *rhs); *stmt = gsi_stmt (*gsi); } /* Callback of scan_function to process assign statements. It examines both sides of the statement, replaces them with a scalare replacement if there is one and generating copying of replacements if scalarized aggregates have been used in the assignment. STMT is a pointer to the assign statement, GSI is used to hold generated statements for type conversions and subtree copying. */ static enum scan_assign_result sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, void *data ATTRIBUTE_UNUSED) { struct access *lacc, *racc; tree ltype, rtype; tree lhs, rhs; bool modify_this_stmt; if (!gimple_assign_single_p (*stmt)) return SRA_SA_NONE; lhs = gimple_assign_lhs (*stmt); rhs = gimple_assign_rhs1 (*stmt); if (TREE_CODE (rhs) == CONSTRUCTOR) return sra_modify_constructor_assign (stmt, gsi); if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) return sra_modify_partially_complex_lhs (*stmt, gsi); if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) { modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), gsi, false, data); modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), gsi, true, data); return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } lacc = get_access_for_expr (lhs); racc = get_access_for_expr (rhs); if (!lacc && !racc) return SRA_SA_NONE; modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) || (racc && racc->grp_to_be_replaced)); if (lacc && lacc->grp_to_be_replaced) { lhs = get_access_replacement (lacc); gimple_assign_set_lhs (*stmt, lhs); ltype = lacc->type; } else ltype = TREE_TYPE (lhs); if (racc && racc->grp_to_be_replaced) { rhs = get_access_replacement (racc); gimple_assign_set_rhs1 (*stmt, rhs); rtype = racc->type; } else rtype = TREE_TYPE (rhs); if (modify_this_stmt) { if (!useless_type_conversion_p (ltype, rtype)) fix_modified_assign_compatibility (gsi, stmt, lacc, racc, lhs, &rhs, ltype, rtype); } /* From this point on, the function deals with assignments in between aggregates when at least one has scalar reductions of some of its components. There are three possible scenarios: Both the LHS and RHS have to-be-scalarized components, 2) only the RHS has or 3) only the LHS has. In the first case, we would like to load the LHS components from RHS components whenever possible. If that is not possible, we would like to read it directly from the RHS (after updating it by storing in it its own components). If there are some necessary unscalarized data in the LHS, those will be loaded by the original assignment too. If neither of these cases happen, the original statement can be removed. Most of this is done by load_assign_lhs_subreplacements. In the second case, we would like to store all RHS scalarized components directly into LHS and if they cover the aggregate completely, remove the statement too. In the third case, we want the LHS components to be loaded directly from the RHS (DSE will remove the original statement if it becomes redundant). This is a bit complex but manageable when types match and when unions do not cause confusion in a way that we cannot really load a component of LHS from the RHS or vice versa (the access representing this level can have subaccesses that are accessible only through a different union field at a higher level - different from the one used in the examined expression). Unions are fun. Therefore, I specially handle a fourth case, happening when there is a specific type cast or it is impossible to locate a scalarized subaccess on the other side of the expression. If that happens, I simply "refresh" the RHS by storing in it is scalarized components leave the original statement there to do the copying and then load the scalar replacements of the LHS. This is what the first branch does. */ if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) || (access_has_children_p (racc) && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) || (access_has_children_p (lacc) && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) { if (access_has_children_p (racc)) generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0, gsi, false, false); if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0, gsi, true, true); } else { if (access_has_children_p (lacc) && access_has_children_p (racc)) { gimple_stmt_iterator orig_gsi = *gsi; bool refreshed; if (lacc->grp_read && !lacc->grp_covered) { handle_unscalarized_data_in_subtree (racc, lhs, gsi); refreshed = true; } else refreshed = false; load_assign_lhs_subreplacements (lacc->first_child, racc, lacc->offset, racc->offset, &orig_gsi, gsi, &refreshed, lhs); if (!refreshed || !racc->grp_unscalarized_data) { if (*stmt == gsi_stmt (*gsi)) gsi_next (gsi); unlink_stmt_vdef (*stmt); gsi_remove (&orig_gsi, true); return SRA_SA_REMOVED; } } else { if (access_has_children_p (racc)) { if (!racc->grp_unscalarized_data) { generate_subtree_copies (racc->first_child, lhs, racc->offset, 0, 0, gsi, false, false); gcc_assert (*stmt == gsi_stmt (*gsi)); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); return SRA_SA_REMOVED; } else generate_subtree_copies (racc->first_child, lhs, racc->offset, 0, 0, gsi, false, true); } else if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, rhs, lacc->offset, 0, 0, gsi, true, true); } } return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } /* Generate statements initializing scalar replacements of parts of function parameters. */ static void initialize_parameter_reductions (void) { gimple_stmt_iterator gsi; gimple_seq seq = NULL; tree parm; for (parm = DECL_ARGUMENTS (current_function_decl); parm; parm = TREE_CHAIN (parm)) { VEC (access_p, heap) *access_vec; struct access *access; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm))) continue; access_vec = get_base_access_vector (parm); if (!access_vec) continue; if (!seq) { seq = gimple_seq_alloc (); gsi = gsi_start (seq); } for (access = VEC_index (access_p, access_vec, 0); access; access = access->next_grp) generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true); } if (seq) gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq); } /* The "main" function of intraprocedural SRA passes. Runs the analysis and if it reveals there are components of some aggregates to be scalarized, it runs the required transformations. */ static unsigned int perform_intra_sra (void) { int ret = 0; sra_initialize (); if (!find_var_candidates ()) goto out; if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL, true, NULL)) goto out; if (!analyze_all_variable_accesses ()) goto out; scan_function (sra_modify_expr, sra_modify_assign, NULL, false, NULL); initialize_parameter_reductions (); ret = TODO_update_ssa; out: sra_deinitialize (); return ret; } /* Perform early intraprocedural SRA. */ static unsigned int early_intra_sra (void) { sra_mode = SRA_MODE_EARLY_INTRA; return perform_intra_sra (); } /* Perform "late" intraprocedural SRA. */ static unsigned int late_intra_sra (void) { sra_mode = SRA_MODE_INTRA; return perform_intra_sra (); } static bool gate_intra_sra (void) { return flag_tree_sra != 0; } struct gimple_opt_pass pass_sra_early = { { GIMPLE_PASS, "esra", /* name */ gate_intra_sra, /* gate */ early_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa | TODO_ggc_collect | TODO_verify_ssa /* todo_flags_finish */ } }; struct gimple_opt_pass pass_sra = { { GIMPLE_PASS, "sra", /* name */ gate_intra_sra, /* gate */ late_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ TODO_update_address_taken, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa | TODO_ggc_collect | TODO_verify_ssa /* todo_flags_finish */ } }; ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-05-10 10:39 ` Martin Jambor @ 2009-05-12 9:49 ` Martin Jambor 0 siblings, 0 replies; 25+ messages in thread From: Martin Jambor @ 2009-05-12 9:49 UTC (permalink / raw) To: Richard Guenther; +Cc: GCC Patches, Jan Hubicka Hi, this is a slightly updated version of the new tree-sra.c file, including changes that I did yesterday in a train and just before I went to bed (so no huge changes but a few more-or-less simple cleanups and clarifications). I bleive this should address all your concerns except for complex component decoupling - I will experiment with that early next week. Please have a look at my previous mail too (http://gcc.gnu.org/ml/gcc-patches/2009-05/msg00594.html) Thanks a lot, Martin /* Scalar Replacement of Aggregates (SRA) converts some structure references into scalar references, exposing them to the scalar optimizers. Copyright (C) 2008, 2009 Free Software Foundation, Inc. Contributed by Martin Jambor <mjambor@suse.cz> This file is part of GCC. GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ /* This file implements Scalar Reduction of Aggregates (SRA). SRA is run twice, once in the early stages of compilation (early SRA) and once in the late stages (late SRA). The aim of both is to turn references to scalar parts of aggregates into uses of independent scalar variables. The two passes are nearly identical, the only difference is that early SRA does not scalarize unions which are used as the result in a GIMPLE_RETURN statement because together with inlining this can lead to weird type conversions. Both passes operate in four stages: 1. The declarations that have properties which make them candidates for scalarization are identified in function find_var_candidates(). The candidates are stored in candidate_bitmap. 2. The function body is scanned. In the process, declarations which are used in a manner that prevent their scalarization are removed from the candidate bitmap. More importantly, for every access into an aggregate, an access structure (struct access) is created by create_access() and stored in a vector associated with the aggregate. Among other information, the aggregate declaration, the offset and size of the access and its type are stored in the structure. On a related note, assign_link structures are created for every assign statement between candidate aggregates and attached to the related accesses. 3. The vectors of accesses are analyzed. They are first sorted according to their offset and size and then scanned for partially overlapping accesses (i.e. those which overlap but one is not entirely within another). Such an access disqualifies the whole aggregate from being scalarized. If there is no such inhibiting overlap, a representative access structure is chosen for every unique combination of offset and size. Afterwards, the pass builds a set of trees from these structures, in which children of an access are within their parent (in terms of offset and size). Then accesses are propagated whenever possible (i.e. in cases when it does not create a partially overlapping access) across assign_links from the right hand side to the left hand side. Then the set of trees for each declaration is traversed again and those accesses which should be replaced by a scalar are identified. 4. The function is traversed again, and for every reference into an aggregate that has some component which is about to be scalarized, statements are amended and new statements are created as necessary. Finally, if a parameter got scalarized, the scalar replacements are initialized with values from respective parameter aggregates. */ #include "config.h" #include "system.h" #include "coretypes.h" #include "alloc-pool.h" #include "tm.h" #include "tree.h" #include "gimple.h" #include "tree-flow.h" #include "diagnostic.h" #include "tree-dump.h" #include "timevar.h" #include "params.h" #include "target.h" #include "flags.h" /* Enumeration of all aggregate reductions we can do. */ enum sra_mode { SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ SRA_MODE_INTRA }; /* late intraprocedural SRA */ /* Global variable describing which aggregate reduction we are performing at the moment. */ static enum sra_mode sra_mode; struct assign_link; /* ACCESS represents each access to an aggregate variable (as a whole or a part). It can also represent a group of accesses that refer to exactly the same fragment of an aggregate (i.e. those that have exactly the same offset and size). Such representatives for a single aggregate, once determined, are linked in a linked list and have the group fields set. Moreover, when doing intraprocedural SRA, a tree is built from those representatives (by the means of first_child and next_sibling pointers), in which all items in a subtree are "within" the root, i.e. their offset is greater or equal to offset of the root and offset+size is smaller or equal to offset+size of the root. Children of an access are sorted by offset. Note that accesses to parts of vector and complex number types always represented by an access to the whole complex number or a vector. It is a duty of the modifying functions to replace them appropriately. */ struct access { /* Values returned by `get_ref_base_and_extent' for each component reference If EXPR isn't a component reference just set `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ HOST_WIDE_INT offset; HOST_WIDE_INT size; tree base; /* Expression. */ tree expr; /* Type. */ tree type; /* Next group representative for this aggregate. */ struct access *next_grp; /* Pointer to the group representative. Pointer to itself if the struct is the representative. */ struct access *group_representative; /* If this access has any children (in terms of the definition above), this points to the first one. */ struct access *first_child; /* Pointer to the next sibling in the access tree as described above. */ struct access *next_sibling; /* Pointers to the first and last element in the linked list of assign links. */ struct assign_link *first_link, *last_link; /* Pointer to the next access in the work queue. */ struct access *next_queued; /* Replacement variable for this access "region." Never to be accessed directly, always only by the means of get_access_replacement() and only when grp_to_be_replaced flag is set. */ tree replacement_decl; /* Is this particular access write access? */ unsigned write : 1; /* Is this access currently in the work queue? */ unsigned grp_queued : 1; /* Does this group contain a write access? This flag is propagated down the access tree. */ unsigned grp_write : 1; /* Does this group contain a read access? This flag is propagated down the access tree. */ unsigned grp_read : 1; /* Is the subtree rooted in this access fully covered by scalar replacements? */ unsigned grp_covered : 1; /* If set to true, this access and all below it in an access tree must not be scalarized. */ unsigned grp_unscalarizable_region : 1; /* Whether data have been written to parts of the aggregate covered by this access which is not to be scalarized. This flag is propagated up in the access tree. */ unsigned grp_unscalarized_data : 1; /* Does this access and/or group contain a write access through a BIT_FIELD_REF? */ unsigned grp_bfr_lhs : 1; /* Set when a scalar replacement should be created for this variable. We do the decision and creation at different places because create_tmp_var cannot be called from within FOR_EACH_REFERENCED_VAR. */ unsigned grp_to_be_replaced : 1; }; typedef struct access *access_p; DEF_VEC_P (access_p); DEF_VEC_ALLOC_P (access_p, heap); /* Alloc pool for allocating access structures. */ static alloc_pool access_pool; /* A structure linking lhs and rhs accesses from an aggregate assignment. They are used to propagate subaccesses from rhs to lhs as long as they don't conflict with what is already there. */ struct assign_link { struct access *lacc, *racc; struct assign_link *next; }; /* Alloc pool for allocating assign link structures. */ static alloc_pool link_pool; /* Base (tree) -> Vector (VEC(access_p,heap) *) map. */ static struct pointer_map_t *base_access_vec; /* Bitmap of bases (candidates). */ static bitmap candidate_bitmap; /* Bitmap of declarations used in a return statement. */ static bitmap retvals_bitmap; /* Obstack for creation of fancy names. */ static struct obstack name_obstack; /* Head of a linked list of accesses that need to have its subaccesses propagated to their assignment counterparts. */ static struct access *work_queue_head; /* Dump contents of ACCESS to file F in a human friendly way. If GRP is true, representative fields are dumped, otherwise those which only describe the individual access are. */ static void dump_access (FILE *f, struct access *access, bool grp) { fprintf (f, "access { "); fprintf (f, "base = (%d)'", DECL_UID (access->base)); print_generic_expr (f, access->base, 0); fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset); fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size); fprintf (f, ", expr = "); print_generic_expr (f, access->expr, 0); fprintf (f, ", type = "); print_generic_expr (f, access->type, 0); if (grp) fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, " "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " "grp_to_be_replaced = %d\n", access->grp_write, access->grp_read, access->grp_covered, access->grp_unscalarizable_region, access->grp_unscalarized_data, access->grp_to_be_replaced); else fprintf (f, ", write = %d'\n", access->write); } /* Dump a subtree rooted in ACCESS to file F, indent by LEVEL. */ static void dump_access_tree_1 (FILE *f, struct access *access, int level) { do { int i; for (i = 0; i < level; i++) fputs ("* ", dump_file); dump_access (f, access, true); if (access->first_child) dump_access_tree_1 (f, access->first_child, level + 1); access = access->next_sibling; } while (access); } /* Dump all access trees for a variable, given the pointer to the first root in ACCESS. */ static void dump_access_tree (FILE *f, struct access *access) { for (; access; access = access->next_grp) dump_access_tree_1 (f, access, 0); } /* Return true iff ACC is non-NULL and has subaccesses. */ static inline bool access_has_children_p (struct access *acc) { return acc && acc->first_child; } /* Return a vector of pointers to accesses for the variable given in BASE or NULL if there is none. */ static VEC (access_p, heap) * get_base_access_vector (tree base) { void **slot; slot = pointer_map_contains (base_access_vec, base); if (!slot) return NULL; else return *(VEC (access_p, heap) **) slot; } /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted in ACCESS. Return NULL if it cannot be found. */ static struct access * find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, HOST_WIDE_INT size) { while (access && (access->offset != offset || access->size != size)) { struct access *child = access->first_child; while (child && (child->offset + child->size <= offset)) child = child->next_sibling; access = child; } return access; } /* Return the first group representative for DECL or NULL if none exists. */ static struct access * get_first_repr_for_decl (tree base) { VEC (access_p, heap) *access_vec; access_vec = get_base_access_vector (base); if (!access_vec) return NULL; return VEC_index (access_p, access_vec, 0); } /* Find an access representative for the variable BASE and given OFFSET and SIZE. Requires that access trees have already been built. Return NULL if it cannot be found. */ static struct access * get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size) { struct access *access; access = get_first_repr_for_decl (base); while (access && (access->offset + access->size <= offset)) access = access->next_grp; if (!access) return NULL; return find_access_in_subtree (access, offset, size); } /* Add LINK to the linked list of assign links of RACC. */ static void add_link_to_rhs (struct access *racc, struct assign_link *link) { gcc_assert (link->racc == racc); if (!racc->first_link) { gcc_assert (!racc->last_link); racc->first_link = link; } else racc->last_link->next = link; racc->last_link = link; link->next = NULL; } /* Move all link structures in their linked list in OLD_RACC to the linked list in NEW_RACC. */ static void relink_to_new_repr (struct access *new_racc, struct access *old_racc) { if (!old_racc->first_link) { gcc_assert (!old_racc->last_link); return; } if (new_racc->first_link) { gcc_assert (!new_racc->last_link->next); gcc_assert (!old_racc->last_link || !old_racc->last_link->next); new_racc->last_link->next = old_racc->first_link; new_racc->last_link = old_racc->last_link; } else { gcc_assert (!new_racc->last_link); new_racc->first_link = old_racc->first_link; new_racc->last_link = old_racc->last_link; } old_racc->first_link = old_racc->last_link = NULL; } /* Add ACCESS to the work queue (which is actually a stack). */ static void add_access_to_work_queue (struct access *access) { if (!access->grp_queued) { gcc_assert (!access->next_queued); access->next_queued = work_queue_head; access->grp_queued = 1; work_queue_head = access; } } /* Pop an access from the work queue, and return it, assuming there is one. */ static struct access * pop_access_from_work_queue (void) { struct access *access = work_queue_head; work_queue_head = access->next_queued; access->next_queued = NULL; access->grp_queued = 0; return access; } /* Allocate necessary structures. */ static void sra_initialize (void) { candidate_bitmap = BITMAP_ALLOC (NULL); retvals_bitmap = BITMAP_ALLOC (NULL); gcc_obstack_init (&name_obstack); access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16); link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16); base_access_vec = pointer_map_create (); } /* Hook fed to pointer_map_traverse, deallocate stored vectors. */ static bool delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value, void *data ATTRIBUTE_UNUSED) { VEC (access_p, heap) *access_vec; access_vec = (VEC (access_p, heap) *) *value; VEC_free (access_p, heap, access_vec); return true; } /* Deallocate all general structures. */ static void sra_deinitialize (void) { BITMAP_FREE (candidate_bitmap); BITMAP_FREE (retvals_bitmap); free_alloc_pool (access_pool); free_alloc_pool (link_pool); obstack_free (&name_obstack, NULL); pointer_map_traverse (base_access_vec, delete_base_accesses, NULL); pointer_map_destroy (base_access_vec); } /* Remove DECL from candidates for SRA and write REASON to the dump file if there is one. */ static void disqualify_candidate (tree decl, const char *reason) { bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "! Disqualifying "); print_generic_expr (dump_file, decl, 0); fprintf (dump_file, " - %s\n", reason); } } /* Return true iff the type contains a field or an element which does not allow scalarization. */ static bool type_internals_preclude_sra_p (tree type) { tree fld; tree et; switch (TREE_CODE (type)) { case RECORD_TYPE: case UNION_TYPE: case QUAL_UNION_TYPE: for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) if (TREE_CODE (fld) == FIELD_DECL) { tree ft = TREE_TYPE (fld); if (TREE_THIS_VOLATILE (fld) || !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld) || !host_integerp (DECL_FIELD_OFFSET (fld), 1) || !host_integerp (DECL_SIZE (fld), 1)) return true; if (AGGREGATE_TYPE_P (ft) && type_internals_preclude_sra_p (ft)) return true; } return false; case ARRAY_TYPE: et = TREE_TYPE (type); if (AGGREGATE_TYPE_P (et)) return type_internals_preclude_sra_p (et); else return false; default: return false; } } /* Create and insert access for EXPR. Return created access, or NULL if it is not possible. */ static struct access * create_access (tree expr, bool write) { struct access *access; void **slot; VEC (access_p,heap) *vec; HOST_WIDE_INT offset, size, max_size; tree base = expr; bool unscalarizable_region = false; base = get_ref_base_and_extent (expr, &offset, &size, &max_size); /* !!! Assert for testing only, remove after some time. */ gcc_assert (base); if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; if (size != max_size) { size = max_size; unscalarizable_region = true; } if (size < 0) { disqualify_candidate (base, "Encountered an unconstrained access."); return NULL; } access = (struct access *) pool_alloc (access_pool); memset (access, 0, sizeof (struct access)); access->base = base; access->offset = offset; access->size = size; access->expr = expr; access->type = TREE_TYPE (expr); access->write = write; access->grp_unscalarizable_region = unscalarizable_region; slot = pointer_map_contains (base_access_vec, base); if (slot) vec = (VEC (access_p, heap) *) *slot; else vec = VEC_alloc (access_p, heap, 32); VEC_safe_push (access_p, heap, vec, access); *((struct VEC (access_p,heap) **) pointer_map_insert (base_access_vec, base)) = vec; return access; } /* Search the given tree for a declaration by skipping handled components and exclude it from the candidates. */ static void disqualify_base_of_expr (tree t, const char *reason) { while (handled_component_p (t)) t = TREE_OPERAND (t, 0); if (DECL_P (t)) disqualify_candidate (t, reason); } /* Scan expression EXPR and create access structures for all accesses to candidates for scalarization. Return the created access or NULL if none is created. */ static struct access * build_access_from_expr_1 (tree *expr_ptr, bool write) { struct access *ret = NULL; tree expr = *expr_ptr; bool bit_ref; if (TREE_CODE (expr) == BIT_FIELD_REF) { expr = TREE_OPERAND (expr, 0); bit_ref = true; } else bit_ref = false; /* We need to dive through V_C_Es in order to get the size of its parameter and not the result type. Ada produces such statements. We are also capable of handling the topmost V_C_E but not any of those buried in other handled components. */ if (TREE_CODE (expr) == VIEW_CONVERT_EXPR) expr = TREE_OPERAND (expr, 0); if (contains_view_convert_expr_p (expr)) { disqualify_base_of_expr (expr, "V_C_E under a different handled " "component."); return NULL; } switch (TREE_CODE (expr)) { case VAR_DECL: case PARM_DECL: case RESULT_DECL: case COMPONENT_REF: case ARRAY_REF: case ARRAY_RANGE_REF: ret = create_access (expr, write); break; case REALPART_EXPR: case IMAGPART_EXPR: expr = TREE_OPERAND (expr, 0); ret = create_access (expr, write); break; default: break; } if (write && bit_ref && ret) ret->grp_bfr_lhs = 1; return ret; } /* Callback of scan_function. Scan expression EXPR and create access structures for all accesses to candidates for scalarization. Return true if any access has been inserted. */ static bool build_access_from_expr (tree *expr_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write, void *data ATTRIBUTE_UNUSED) { return build_access_from_expr_1 (expr_ptr, write) != NULL; } /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in modes in which it matters, return true iff they have been disqualified. RHS may be NULL, in that case ignore it. If we scalarize an aggregate in intra-SRA we may need to add statements after each statement. This is not possible if a statement unconditionally has to end the basic block. */ static bool disqualify_ops_if_throwing_stmt (gimple stmt, tree lhs, tree rhs) { if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)) { disqualify_base_of_expr (lhs, "LHS of a throwing stmt."); if (rhs) disqualify_base_of_expr (rhs, "RHS of a throwing stmt."); return true; } return false; } /* Result code for scan_assign callback for scan_function. */ enum scan_assign_result { SRA_SA_NONE, /* nothing done for the stmt */ SRA_SA_PROCESSED, /* stmt analyzed/changed */ SRA_SA_REMOVED }; /* stmt redundant and eliminated */ /* Callback of scan_function. Scan expressions occuring in the statement pointed to by STMT_EXPR, create access structures for all accesses to candidates for scalarization and remove those candidates which occur in statements or expressions that prevent them from being split apart. Return true if any access has been inserted. */ static enum scan_assign_result build_accesses_from_assign (gimple *stmt_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { gimple stmt = *stmt_ptr; tree *lhs_ptr, *rhs_ptr; struct access *lacc, *racc; if (!gimple_assign_single_p (stmt)) return SRA_SA_NONE; lhs_ptr = gimple_assign_lhs_ptr (stmt); rhs_ptr = gimple_assign_rhs1_ptr (stmt); if (disqualify_ops_if_throwing_stmt (stmt, *lhs_ptr, *rhs_ptr)) return SRA_SA_NONE; racc = build_access_from_expr_1 (rhs_ptr, false); lacc = build_access_from_expr_1 (lhs_ptr, true); if (lacc && racc && !lacc->grp_unscalarizable_region && !racc->grp_unscalarizable_region && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) /* FIXME: Turn the following line into an assert after PR 40058 is fixed. */ && lacc->size == racc->size && useless_type_conversion_p (lacc->type, racc->type)) { struct assign_link *link; link = (struct assign_link *) pool_alloc (link_pool); memset (link, 0, sizeof (struct assign_link)); link->lacc = lacc; link->racc = racc; add_link_to_rhs (racc, link); } return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE; } /* Callback of walk_stmt_load_store_addr_ops visit_addr used to determine GIMPLE_ASM operands with memory constrains which cannot be scalarized. */ static bool asm_visit_addr (gimple stmt ATTRIBUTE_UNUSED, tree op, void *data ATTRIBUTE_UNUSED) { if (DECL_P (op)) disqualify_candidate (op, "Non-scalarizable GIMPLE_ASM operand."); return false; } /* Scan function and look for interesting statements. Return true if any has been found or processed, as indicated by callbacks. SCAN_EXPR is a callback called on all expressions within statements except assign statements and those deemed entirely unsuitable for some reason (all operands in such statements and expression are removed from candidate_bitmap). SCAN_ASSIGN is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback called on assign statements and those call statements which have a lhs and it is the only callback which can be NULL. ANALYSIS_STAGE is true when running in the analysis stage of a pass and thus no statement is being modified. DATA is a pointer passed to all callbacks. If any single callback returns true, this function also returns true, otherwise it returns false. */ static bool scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *), enum scan_assign_result (*scan_assign) (gimple *, gimple_stmt_iterator *, void *), bool (*handle_ssa_defs)(gimple, void *), bool analysis_stage, void *data) { gimple_stmt_iterator gsi; basic_block bb; unsigned i; tree *t; bool ret = false; FOR_EACH_BB (bb) { bool bb_changed = false; gsi = gsi_start_bb (bb); while (!gsi_end_p (gsi)) { gimple stmt = gsi_stmt (gsi); enum scan_assign_result assign_result; bool any = false, deleted = false; switch (gimple_code (stmt)) { case GIMPLE_RETURN: t = gimple_return_retval_ptr (stmt); if (*t != NULL_TREE) { if (DECL_P (*t)) { tree ret_type = TREE_TYPE (*t); if (analysis_stage && sra_mode == SRA_MODE_EARLY_INTRA && (TREE_CODE (ret_type) == UNION_TYPE || TREE_CODE (ret_type) == QUAL_UNION_TYPE)) disqualify_candidate (*t, "Union in a return statement."); else bitmap_set_bit (retvals_bitmap, DECL_UID (*t)); } any |= scan_expr (t, &gsi, false, data); } break; case GIMPLE_ASSIGN: assign_result = scan_assign (&stmt, &gsi, data); any |= assign_result == SRA_SA_PROCESSED; deleted = assign_result == SRA_SA_REMOVED; if (handle_ssa_defs && assign_result != SRA_SA_REMOVED) any |= handle_ssa_defs (stmt, data); break; case GIMPLE_CALL: /* Operands must be processed before the lhs. */ for (i = 0; i < gimple_call_num_args (stmt); i++) { tree *argp = gimple_call_arg_ptr (stmt, i); any |= scan_expr (argp, &gsi, false, data); } if (gimple_call_lhs (stmt)) { tree *lhs_ptr = gimple_call_lhs_ptr (stmt); if (!analysis_stage || !disqualify_ops_if_throwing_stmt (stmt, *lhs_ptr, NULL)) { any |= scan_expr (lhs_ptr, &gsi, true, data); if (handle_ssa_defs) any |= handle_ssa_defs (stmt, data); } } break; case GIMPLE_ASM: if (analysis_stage) walk_stmt_load_store_addr_ops (stmt, NULL, NULL, NULL, asm_visit_addr); for (i = 0; i < gimple_asm_ninputs (stmt); i++) { tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i)); any |= scan_expr (op, &gsi, false, data); } for (i = 0; i < gimple_asm_noutputs (stmt); i++) { tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i)); any |= scan_expr (op, &gsi, true, data); } default: break; } if (any) { ret = true; bb_changed = true; if (!analysis_stage) { update_stmt (stmt); if (!stmt_could_throw_p (stmt)) remove_stmt_from_eh_region (stmt); } } if (deleted) bb_changed = true; else { gsi_next (&gsi); ret = true; } } if (!analysis_stage && bb_changed) gimple_purge_dead_eh_edges (bb); } return ret; } /* Helper of QSORT function. There are pointers to accesses in the array. An access is considered smaller than another if it has smaller offset or if the offsets are the same but is size is bigger. */ static int compare_access_positions (const void *a, const void *b) { const access_p *fp1 = (const access_p *) a; const access_p *fp2 = (const access_p *) b; const access_p f1 = *fp1; const access_p f2 = *fp2; if (f1->offset != f2->offset) return f1->offset < f2->offset ? -1 : 1; if (f1->size == f2->size) return 0; /* We want the bigger accesses first, thus the opposite operator in the next line: */ return f1->size > f2->size ? -1 : 1; } /* Append a name of the declaration to the name obstack. A helper function for make_fancy_name. */ static void make_fancy_decl_name (tree decl) { char buffer[32]; tree name = DECL_NAME (decl); if (name) obstack_grow (&name_obstack, IDENTIFIER_POINTER (name), IDENTIFIER_LENGTH (name)); else { sprintf (buffer, "D%u", DECL_UID (decl)); obstack_grow (&name_obstack, buffer, strlen (buffer)); } } /* Helper for make_fancy_name. */ static void make_fancy_name_1 (tree expr) { char buffer[32]; tree index; if (DECL_P (expr)) { make_fancy_decl_name (expr); return; } switch (TREE_CODE (expr)) { case COMPONENT_REF: make_fancy_name_1 (TREE_OPERAND (expr, 0)); obstack_1grow (&name_obstack, '$'); make_fancy_decl_name (TREE_OPERAND (expr, 1)); break; case ARRAY_REF: make_fancy_name_1 (TREE_OPERAND (expr, 0)); obstack_1grow (&name_obstack, '$'); /* Arrays with only one element may not have a constant as their index. */ index = TREE_OPERAND (expr, 1); if (TREE_CODE (index) != INTEGER_CST) break; sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index)); obstack_grow (&name_obstack, buffer, strlen (buffer)); break; case BIT_FIELD_REF: case REALPART_EXPR: case IMAGPART_EXPR: gcc_unreachable (); /* we treat these as scalars. */ break; default: break; } } /* Create a human readable name for replacement variable of ACCESS. */ static char * make_fancy_name (tree expr) { make_fancy_name_1 (expr); obstack_1grow (&name_obstack, '\0'); return XOBFINISH (&name_obstack, char *); } /* Helper function for build_ref_for_offset. */ static bool build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset, tree exp_type) { while (1) { tree fld; tree tr_size, index; HOST_WIDE_INT el_size; if (offset == 0 && exp_type && useless_type_conversion_p (exp_type, type)) return true; switch (TREE_CODE (type)) { case UNION_TYPE: case QUAL_UNION_TYPE: case RECORD_TYPE: /* Some ADA records are half-unions, treat all of them the same. */ for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) { HOST_WIDE_INT pos, size; tree expr, *expr_ptr; if (TREE_CODE (fld) != FIELD_DECL) continue; pos = int_bit_position (fld); gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0); size = tree_low_cst (DECL_SIZE (fld), 1); if (pos > offset || (pos + size) <= offset) continue; if (res) { expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld, NULL_TREE); expr_ptr = &expr; } else expr_ptr = NULL; if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld), offset - pos, exp_type)) { if (res) *res = expr; return true; } } return false; case ARRAY_TYPE: tr_size = TYPE_SIZE (TREE_TYPE (type)); if (!tr_size || !host_integerp (tr_size, 1)) return false; el_size = tree_low_cst (tr_size, 1); index = build_int_cst (TYPE_DOMAIN (type), offset / el_size); if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) index = int_const_binop (PLUS_EXPR, index, TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0); if (res) *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE, NULL_TREE); offset = offset % el_size; type = TREE_TYPE (type); break; default: if (offset != 0) return false; if (exp_type) return false; else return true; } } } /* Construct an expression that would reference a part of aggregate *EXPR of type TYPE at the given OFFSET of the type EXP_TYPE. If EXPR is NULL, the function only determines whether it can build such a reference without actually doing it. FIXME: Eventually this should be replaced with maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a minor rewrite of fold_stmt. */ static bool build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset, tree exp_type, bool allow_ptr) { if (allow_ptr && POINTER_TYPE_P (type)) { type = TREE_TYPE (type); if (expr) *expr = fold_build1 (INDIRECT_REF, type, *expr); } return build_ref_for_offset_1 (expr, type, offset, exp_type); } /* The very first phase of intraprocedural SRA. It marks in candidate_bitmap those with type which is suitable for scalarization. */ static bool find_var_candidates (void) { tree var, type; referenced_var_iterator rvi; bool ret = false; FOR_EACH_REFERENCED_VAR (var, rvi) { if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL) continue; type = TREE_TYPE (var); if (!AGGREGATE_TYPE_P (type) || needs_to_live_in_memory (var) || TREE_THIS_VOLATILE (var) || !COMPLETE_TYPE_P (type) || !host_integerp (TYPE_SIZE (type), 1) || tree_low_cst (TYPE_SIZE (type), 1) == 0 || type_internals_preclude_sra_p (type)) continue; bitmap_set_bit (candidate_bitmap, DECL_UID (var)); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Candidate (%d): ", DECL_UID (var)); print_generic_expr (dump_file, var, 0); fprintf (dump_file, "\n"); } ret = true; } return ret; } /* Sort all accesses for the given variable, check for partial overlaps and return NULL if there are any. If there are none, pick a representative for each combination of offset and size and create a linked list out of them. Return the pointer to the first representative and make sure it is the first one in the vector of accesses. */ static struct access * sort_and_splice_var_accesses (tree var) { int i, j, access_count; struct access *res, **prev_acc_ptr = &res; VEC (access_p, heap) *access_vec; bool first = true; HOST_WIDE_INT low = -1, high = 0; access_vec = get_base_access_vector (var); if (!access_vec) return NULL; access_count = VEC_length (access_p, access_vec); /* Sort by <OFFSET, SIZE>. */ qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p), compare_access_positions); i = 0; while (i < access_count) { struct access *access = VEC_index (access_p, access_vec, i); bool modification = access->write; bool grp_read = !access->write; bool grp_bfr_lhs = access->grp_bfr_lhs; bool first_scalar = is_gimple_reg_type (access->type); bool unscalarizable_region = access->grp_unscalarizable_region; if (first || access->offset >= high) { first = false; low = access->offset; high = access->offset + access->size; } else if (access->offset > low && access->offset + access->size > high) return NULL; else gcc_assert (access->offset >= low && access->offset + access->size <= high); j = i + 1; while (j < access_count) { struct access *ac2 = VEC_index (access_p, access_vec, j); if (ac2->offset != access->offset || ac2->size != access->size) break; modification |= ac2->write; grp_read |= !ac2->write; grp_bfr_lhs |= ac2->grp_bfr_lhs; unscalarizable_region |= ac2->grp_unscalarizable_region; relink_to_new_repr (access, ac2); /* If one of the equivalent accesses is scalar, use it as a representative (this happens when when there is for example on a single scalar field in a structure). */ if (!first_scalar && is_gimple_reg_type (ac2->type)) { struct access tmp_acc; first_scalar = true; memcpy (&tmp_acc, ac2, sizeof (struct access)); memcpy (ac2, access, sizeof (struct access)); memcpy (access, &tmp_acc, sizeof (struct access)); } ac2->group_representative = access; j++; } i = j; access->group_representative = access; access->grp_write = modification; access->grp_read = grp_read; access->grp_bfr_lhs = grp_bfr_lhs; access->grp_unscalarizable_region = unscalarizable_region; if (access->first_link) add_access_to_work_queue (access); *prev_acc_ptr = access; prev_acc_ptr = &access->next_grp; } gcc_assert (res == VEC_index (access_p, access_vec, 0)); return res; } /* Create a variable for the given ACCESS which determines the type, name and a few other properties. Return the variable declaration and store it also to ACCESS->replacement. */ static tree create_access_replacement (struct access *access) { tree repl; repl = create_tmp_var (access->type, "SR"); get_var_ann (repl); add_referenced_var (repl); mark_sym_for_renaming (repl); if (!access->grp_bfr_lhs && (TREE_CODE (access->type) == COMPLEX_TYPE || TREE_CODE (access->type) == VECTOR_TYPE)) DECL_GIMPLE_REG_P (repl) = 1; DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); DECL_ARTIFICIAL (repl) = 1; if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base) && !DECL_ARTIFICIAL (access->base)) { char *pretty_name = make_fancy_name (access->expr); DECL_NAME (repl) = get_identifier (pretty_name); obstack_free (&name_obstack, pretty_name); SET_DECL_DEBUG_EXPR (repl, access->expr); DECL_DEBUG_EXPR_IS_FROM (repl) = 1; DECL_IGNORED_P (repl) = 0; } DECL_IGNORED_P (repl) = DECL_IGNORED_P (access->base); TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base); if (dump_file) { fprintf (dump_file, "Created a replacement for "); print_generic_expr (dump_file, access->base, 0); fprintf (dump_file, " offset: %u, size: %u: ", (unsigned) access->offset, (unsigned) access->size); print_generic_expr (dump_file, repl, 0); fprintf (dump_file, "\n"); } return repl; } /* Return ACCESS scalar replacement, create it if it does not exist yet. */ static inline tree get_access_replacement (struct access *access) { gcc_assert (access->grp_to_be_replaced); if (access->replacement_decl) return access->replacement_decl; access->replacement_decl = create_access_replacement (access); return access->replacement_decl; } /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the linked list along the way. Stop when *ACCESS is NULL or the access pointed to it is not "within" the root. */ static void build_access_subtree (struct access **access) { struct access *root = *access, *last_child = NULL; HOST_WIDE_INT limit = root->offset + root->size; *access = (*access)->next_grp; while (*access && (*access)->offset + (*access)->size <= limit) { if (!last_child) root->first_child = *access; else last_child->next_sibling = *access; last_child = *access; build_access_subtree (access); } } /* Build a tree of access representatives, ACCESS is the pointer to the first one, others are linked in a list by the next_grp field. Decide about scalar replacements on the way, return true iff any are to be created. */ static void build_access_trees (struct access *access) { while (access) { struct access *root = access; build_access_subtree (&access); root->next_grp = access; } } /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when both seeming beneficial and when ALLOW_REPLACEMENTS allows it. Also set all sorts of access flags appropriately along the way, notably always ser grp_read when MARK_READ is true and grp_write when MARK_WRITE is true. */ static bool analyze_access_subtree (struct access *root, bool allow_replacements, bool mark_read, bool mark_write) { struct access *child; HOST_WIDE_INT limit = root->offset + root->size; HOST_WIDE_INT covered_to = root->offset; bool scalar = is_gimple_reg_type (root->type); bool hole = false, sth_created = false; if (mark_read) root->grp_read = true; else if (root->grp_read) mark_read = true; if (mark_write) root->grp_write = true; else if (root->grp_write) mark_write = true; if (root->grp_unscalarizable_region) allow_replacements = false; for (child = root->first_child; child; child = child->next_sibling) { if (!hole && child->offset < covered_to) hole = true; else covered_to += child->size; sth_created |= analyze_access_subtree (child, allow_replacements && !scalar, mark_read, mark_write); root->grp_unscalarized_data |= child->grp_unscalarized_data; hole |= !child->grp_covered; } if (allow_replacements && scalar && !root->first_child) { if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Marking "); print_generic_expr (dump_file, root->base, 0); fprintf (dump_file, " offset: %u, size: %u: ", (unsigned) root->offset, (unsigned) root->size); fprintf (dump_file, " to be replaced.\n"); } root->grp_to_be_replaced = 1; sth_created = true; hole = false; } else if (covered_to < limit) hole = true; if (sth_created && !hole) { root->grp_covered = 1; return true; } if (root->grp_write || TREE_CODE (root->base) == PARM_DECL) root->grp_unscalarized_data = 1; /* not covered and written to */ if (sth_created) return true; return false; } /* Analyze all access trees linked by next_grp by the means of analyze_access_subtree. */ static bool analyze_access_trees (struct access *access) { bool ret = false; while (access) { if (analyze_access_subtree (access, true, false, false)) ret = true; access = access->next_grp; } return ret; } /* Return true iff a potential new child of LACC at offset OFFSET and with size SIZE would conflict with an already existing one. If exactly such a child already exists in LACC, store a pointer to it in EXACT_MATCH. */ static bool child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, HOST_WIDE_INT size, struct access **exact_match) { struct access *child; for (child = lacc->first_child; child; child = child->next_sibling) { if (child->offset == norm_offset && child->size == size) { *exact_match = child; return true; } if (child->offset < norm_offset + size && child->offset + child->size > norm_offset) return true; } return false; } /* Set the expr of TARGET to one just like MODEL but with is own base at the bottom of the handled components. */ static void duplicate_expr_for_different_base (struct access *target, struct access *model) { tree t, expr = unshare_expr (model->expr); gcc_assert (handled_component_p (expr)); t = expr; while (handled_component_p (TREE_OPERAND (t, 0))) t = TREE_OPERAND (t, 0); gcc_assert (TREE_OPERAND (t, 0) == model->base); TREE_OPERAND (t, 0) = target->base; target->expr = expr; } /* Create a new child access of PARENT, with all properties just like MODEL except for its offset and with its grp_write false and grp_read true. Return the new access. Note that this access is created long after all splicing and sorting, it's not located in any access vector and is automatically a representative of its group. */ static struct access * create_artificial_child_access (struct access *parent, struct access *model, HOST_WIDE_INT new_offset) { struct access *access; struct access **child; gcc_assert (!model->grp_unscalarizable_region); access = (struct access *) pool_alloc (access_pool); memset (access, 0, sizeof (struct access)); access->base = parent->base; access->offset = new_offset; access->size = model->size; duplicate_expr_for_different_base (access, model); access->type = model->type; access->grp_write = true; access->grp_read = false; child = &parent->first_child; while (*child && (*child)->offset < new_offset) child = &(*child)->next_sibling; access->next_sibling = *child; *child = access; return access; } /* Propagate all subaccesses of RACC across an assignment link to LACC. Return true if any new subaccess was created. Additionally, if RACC is a scalar access but LACC is not, change the type of the latter. */ static bool propagate_subacesses_accross_link (struct access *lacc, struct access *racc) { struct access *rchild; HOST_WIDE_INT norm_delta = lacc->offset - racc->offset; bool ret = false; if (is_gimple_reg_type (lacc->type) || lacc->grp_unscalarizable_region || racc->grp_unscalarizable_region) return false; if (!lacc->first_child && !racc->first_child && is_gimple_reg_type (racc->type) && (sra_mode == SRA_MODE_INTRA || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base)))) { duplicate_expr_for_different_base (lacc, racc); lacc->type = racc->type; return false; } for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling) { struct access *new_acc = NULL; HOST_WIDE_INT norm_offset = rchild->offset + norm_delta; if (rchild->grp_unscalarizable_region) continue; if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size, &new_acc)) { if (new_acc && rchild->first_child) ret |= propagate_subacesses_accross_link (new_acc, rchild); continue; } new_acc = create_artificial_child_access (lacc, rchild, norm_offset); if (racc->first_child) propagate_subacesses_accross_link (new_acc, rchild); ret = true; } return ret; } /* Propagate all subaccesses across assignment links. */ static void propagate_all_subaccesses (void) { while (work_queue_head) { struct access *racc = pop_access_from_work_queue (); struct assign_link *link; gcc_assert (racc->first_link); for (link = racc->first_link; link; link = link->next) { struct access *lacc = link->lacc; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base))) continue; lacc = lacc->group_representative; if (propagate_subacesses_accross_link (lacc, racc) && lacc->first_link) add_access_to_work_queue (lacc); } } } /* Go through all accesses collected throughout the (intraprocedural) analysis stage, exclude overlapping ones, identify representatives and build trees out of them, making decisions about scalarization on the way. Return true iff there are any to-be-scalarized variables after this stage. */ static bool analyze_all_variable_accesses (void) { tree var; referenced_var_iterator rvi; bool res = false; FOR_EACH_REFERENCED_VAR (var, rvi) if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) { struct access *access; access = sort_and_splice_var_accesses (var); if (access) build_access_trees (access); else disqualify_candidate (var, "No or inhibitingly overlapping accesses."); } propagate_all_subaccesses (); FOR_EACH_REFERENCED_VAR (var, rvi) if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) { struct access *access = get_first_repr_for_decl (var); if (analyze_access_trees (access)) { res = true; if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "\nAccess trees for "); print_generic_expr (dump_file, var, 0); fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); dump_access_tree (dump_file, access); fprintf (dump_file, "\n"); } } else disqualify_candidate (var, "No scalar replacements to be created."); } return res; } /* Return true iff a reference statement into aggregate AGG can be built for every single to-be-replaced accesses that is a child of ACCESS, its sibling or a child of its sibling. TOP_OFFSET is the offset from the processed access subtree that has to be subtracted from offset of each access. */ static bool ref_expr_for_all_replacements_p (struct access *access, tree agg, HOST_WIDE_INT top_offset) { do { if (access->grp_to_be_replaced && !build_ref_for_offset (NULL, TREE_TYPE (agg), access->offset - top_offset, access->type, false)) return false; if (access->first_child && !ref_expr_for_all_replacements_p (access->first_child, agg, top_offset)) return false; access = access->next_sibling; } while (access); return true; } /* Generate statements copying scalar replacements of accesses within a subtree into or out of AGG. ACCESS is the first child of the root of the subtree to be processed. AGG is an aggregate type expression (can be a declaration but does not have to be, it can for example also be an indirect_ref). TOP_OFFSET is the offset of the processed subtree which has to be subtracted from offsets of individual accesses to get corresponding offsets for AGG. If CHUNK_SIZE is non-null, copy only replacements in the interval <start_offset, start_offset + chunk_size>, otherwise copy all. GSI is a statement iterator used to place the new statements. WRITE should be true when the statements should write from AGG to the replacement and false if vice versa. if INSERT_AFTER is true, new statements will be added after the current statement in GSI, they will be added before the statement otherwise. */ static void generate_subtree_copies (struct access *access, tree agg, HOST_WIDE_INT top_offset, HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size, gimple_stmt_iterator *gsi, bool write, bool insert_after) { do { tree expr = unshare_expr (agg); if (chunk_size && access->offset >= start_offset + chunk_size) return; if (access->grp_to_be_replaced && (chunk_size == 0 || access->offset + access->size > start_offset)) { bool repl_found; gimple stmt; repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg), access->offset - top_offset, access->type, false); gcc_assert (repl_found); if (write) stmt = gimple_build_assign (get_access_replacement (access), expr); else { tree repl = get_access_replacement (access); TREE_NO_WARNING (repl) = 1; stmt = gimple_build_assign (expr, repl); } if (insert_after) gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } if (access->first_child) generate_subtree_copies (access->first_child, agg, top_offset, start_offset, chunk_size, gsi, write, insert_after); access = access->next_sibling; } while (access); } /* Assign zero to all scalar replacements in an access subtree. ACCESS is the the root of the subtree to be processed. GSI is the statement iterator used for inserting statements which are added after the current statement if INSERT_AFTER is true or before it otherwise. */ static void init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi, bool insert_after) { struct access *child; if (access->grp_to_be_replaced) { gimple stmt; stmt = gimple_build_assign (get_access_replacement (access), fold_convert (access->type, integer_zero_node)); if (insert_after) gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } for (child = access->first_child; child; child = child->next_sibling) init_subtree_with_zero (child, gsi, insert_after); } /* Search for an access representative for the given expression EXPR and return it or NULL if it cannot be found. */ static struct access * get_access_for_expr (tree expr) { HOST_WIDE_INT offset, size, max_size; tree base; /* FIXME: This should not be necessary but Ada produces V_C_Es with a type of a different size than the size of its argument and we need the latter one. */ if (TREE_CODE (expr) == VIEW_CONVERT_EXPR) expr = TREE_OPERAND (expr, 0); base = get_ref_base_and_extent (expr, &offset, &size, &max_size); /* !!! Assert for testing only, remove after some time. */ gcc_assert (base); if (max_size == -1 || !DECL_P (base)) return NULL; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; return get_var_base_offset_size_access (base, offset, max_size); } /* Substitute into *EXPR an expression of type TYPE with the value of the replacement of ACCESS. This is done either by producing a special V_C_E assignment statement converting the replacement to a new temporary of the requested type if TYPE is is_gimple_reg_type or by going through the base aggregate if it is not. */ static void fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, gimple_stmt_iterator *gsi, bool write) { tree repl = get_access_replacement (access); if (is_gimple_reg_type (type)) { tree tmp = create_tmp_var (type, "SRvce"); add_referenced_var (tmp); tmp = make_ssa_name (tmp, NULL); if (write) { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); *expr = tmp; SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); stmt = gimple_build_assign (repl, conv); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); stmt = gimple_build_assign (tmp, conv); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); SSA_NAME_DEF_STMT (tmp) = stmt; *expr = tmp; update_stmt (stmt); } } else { if (write) { gimple stmt; stmt = gimple_build_assign (repl, unshare_expr (access->expr)); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; stmt = gimple_build_assign (unshare_expr (access->expr), repl); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } } } /* Change STMT to assign compatible types by means of adding component or array references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as variable with the same names in sra_modify_assign. If we can avoid a V_C_E used to load a field from a (single field) record or a union or an element of a one-sized array by producing COMPONENT_REFs and ARRAY_REFs instead, do so. */ static void fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple stmt, struct access *lacc, struct access *racc, tree lhs, tree *rhs, tree ltype, tree rtype) { if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) && !access_has_children_p (lacc)) { tree expr = unshare_expr (lhs); bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, false); if (found) { gimple_assign_set_lhs (stmt, expr); return; } } if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) && !access_has_children_p (racc)) { tree expr = unshare_expr (*rhs); bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, false); if (found) { gimple_assign_set_rhs1 (stmt, expr); return; } } /* Note that *rhs is a local variable of caller, it's not a pointer to the pointer in the gimple statement: */ *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); /* gimple_assign_set_rhs1 ICEs when passed V_C_E so we have to resort to this: */ gimple_assign_set_rhs_from_tree (gsi, *rhs); gcc_assert (stmt == gsi_stmt (*gsi)); } /* Callback for scan_function. Replace the expression EXPR with a scalar replacement if there is one and generate other statements to do type conversion or subtree copying if necessary. GSI is used to place newly created statements, WRITE is true if the expression is being written to (it is on a LHS of a statement or output in an assembly statement). */ static bool sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write, void *data ATTRIBUTE_UNUSED) { struct access *access; tree type, bfr; if (TREE_CODE (*expr) == BIT_FIELD_REF) { bfr = *expr; expr = &TREE_OPERAND (*expr, 0); } else bfr = NULL_TREE; if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR) expr = &TREE_OPERAND (*expr, 0); type = TREE_TYPE (*expr); access = get_access_for_expr (*expr); if (!access) return false; if (access->grp_to_be_replaced) { if (!useless_type_conversion_p (type, access->type)) fix_incompatible_types_for_expr (expr, type, access, gsi, write); else *expr = get_access_replacement (access); } if (access->first_child) { HOST_WIDE_INT start_offset, chunk_size; if (bfr && host_integerp (TREE_OPERAND (bfr, 1), 1) && host_integerp (TREE_OPERAND (bfr, 2), 1)) { start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1); chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1); } else start_offset = chunk_size = 0; generate_subtree_copies (access->first_child, access->base, 0, start_offset, chunk_size, gsi, write, write); } return true; } /* Store all replacements in the access tree rooted in TOP_RACC either to their base aggregate if there are unscalarized data or directly to LHS otherwise. */ static void handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs, gimple_stmt_iterator *gsi) { if (top_racc->grp_unscalarized_data) generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0, gsi, false, false); else generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset, 0, 0, gsi, false, false); } /* Try to generate statements to load all sub-replacements in an access (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC (sub)tree. If that is not possible, refresh the TOP_RACC base aggregate and load the accesses from it. LEFT_OFFSET is the offset of the left whole subtree being copied, RIGHT_OFFSET is the same thing for the right subtree. GSI is stmt iterator used for statement insertions. *REFRESHED is true iff the rhs top aggregate has already been refreshed by contents of its scalar reductions and is set to true if this function has to do it. */ static void load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc, HOST_WIDE_INT left_offset, HOST_WIDE_INT right_offset, gimple_stmt_iterator *old_gsi, gimple_stmt_iterator *new_gsi, bool *refreshed, tree lhs) { do { if (lacc->grp_to_be_replaced) { struct access *racc; HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset; racc = find_access_in_subtree (top_racc, offset, lacc->size); if (racc && racc->grp_to_be_replaced) { gimple stmt; if (useless_type_conversion_p (lacc->type, racc->type)) stmt = gimple_build_assign (get_access_replacement (lacc), get_access_replacement (racc)); else { tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type, get_access_replacement (racc)); stmt = gimple_build_assign (get_access_replacement (lacc), rhs); } gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { tree expr = unshare_expr (top_racc->base); bool repl_found; gimple stmt; /* No suitable access on the right hand side, need to load from the aggregate. See if we have to update it first... */ if (!*refreshed) { gcc_assert (top_racc->first_child); handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); *refreshed = true; } repl_found = build_ref_for_offset (&expr, TREE_TYPE (top_racc->base), lacc->offset - left_offset, lacc->type, false); gcc_assert (repl_found); stmt = gimple_build_assign (get_access_replacement (lacc), expr); gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } } else if (lacc->grp_read && !lacc->grp_covered && !*refreshed) { handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); *refreshed = true; } if (lacc->first_child) load_assign_lhs_subreplacements (lacc->first_child, top_racc, left_offset, right_offset, old_gsi, new_gsi, refreshed, lhs); lacc = lacc->next_sibling; } while (lacc); } /* Modify assignments with a CONSTRUCTOR on their RHS. STMT contains a pointer to the assignment and GSI is the statement iterator pointing at it. Returns the same values as sra_modify_assign. */ static enum scan_assign_result sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi) { tree lhs = gimple_assign_lhs (*stmt); struct access *acc; gcc_assert (TREE_CODE (lhs) != REALPART_EXPR && TREE_CODE (lhs) != IMAGPART_EXPR); acc = get_access_for_expr (lhs); if (!acc) return SRA_SA_NONE; if (VEC_length (constructor_elt, CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0) { /* I have never seen this code path trigger but if it can happen the following should handle it gracefully. */ if (access_has_children_p (acc)) generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi, true, true); return SRA_SA_PROCESSED; } if (acc->grp_covered) { init_subtree_with_zero (acc, gsi, false); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); return SRA_SA_REMOVED; } else { init_subtree_with_zero (acc, gsi, true); return SRA_SA_PROCESSED; } } /* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with to-be-scalarized expressions with them. STMT is the statement and GSI is the iterator used to place new helper statements. Returns the same values as sra_modify_assign. */ static enum scan_assign_result sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi) { tree lhs, complex, ptype, rp, ip; struct access *access; gimple new_stmt, aux_stmt; lhs = gimple_assign_lhs (stmt); complex = TREE_OPERAND (lhs, 0); access = get_access_for_expr (complex); if (!access || !access->grp_to_be_replaced) return SRA_SA_NONE; ptype = TREE_TYPE (TREE_TYPE (complex)); rp = create_tmp_var (ptype, "SRr"); add_referenced_var (rp); rp = make_ssa_name (rp, NULL); ip = create_tmp_var (ptype, "SRp"); add_referenced_var (ip); ip = make_ssa_name (ip, NULL); if (TREE_CODE (lhs) == IMAGPART_EXPR) { aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype, get_access_replacement (access))); SSA_NAME_DEF_STMT (rp) = aux_stmt; gimple_assign_set_lhs (stmt, ip); SSA_NAME_DEF_STMT (ip) = stmt; } else { aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype, get_access_replacement (access))); SSA_NAME_DEF_STMT (ip) = aux_stmt; gimple_assign_set_lhs (stmt, rp); SSA_NAME_DEF_STMT (rp) = stmt; } gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT); update_stmt (aux_stmt); new_stmt = gimple_build_assign (get_access_replacement (access), fold_build2 (COMPLEX_EXPR, access->type, rp, ip)); gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); update_stmt (new_stmt); return SRA_SA_PROCESSED; } /* Callback of scan_function to process assign statements. It examines both sides of the statement, replaces them with a scalare replacement if there is one and generating copying of replacements if scalarized aggregates have been used in the assignment. STMT is a pointer to the assign statement, GSI is used to hold generated statements for type conversions and subtree copying. */ static enum scan_assign_result sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, void *data ATTRIBUTE_UNUSED) { struct access *lacc, *racc; tree ltype, rtype; tree lhs, rhs; bool modify_this_stmt; if (!gimple_assign_single_p (*stmt)) return SRA_SA_NONE; lhs = gimple_assign_lhs (*stmt); rhs = gimple_assign_rhs1 (*stmt); if (TREE_CODE (rhs) == CONSTRUCTOR) return sra_modify_constructor_assign (stmt, gsi); if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) return sra_modify_partially_complex_lhs (*stmt, gsi); if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) { modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), gsi, false, data); modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), gsi, true, data); return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } lacc = get_access_for_expr (lhs); racc = get_access_for_expr (rhs); if (!lacc && !racc) return SRA_SA_NONE; modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) || (racc && racc->grp_to_be_replaced)); if (lacc && lacc->grp_to_be_replaced) { lhs = get_access_replacement (lacc); gimple_assign_set_lhs (*stmt, lhs); ltype = lacc->type; } else ltype = TREE_TYPE (lhs); if (racc && racc->grp_to_be_replaced) { rhs = get_access_replacement (racc); gimple_assign_set_rhs1 (*stmt, rhs); rtype = racc->type; } else rtype = TREE_TYPE (rhs); if (modify_this_stmt) { if (!useless_type_conversion_p (ltype, rtype)) fix_modified_assign_compatibility (gsi, *stmt, lacc, racc, lhs, &rhs, ltype, rtype); } /* From this point on, the function deals with assignments in between aggregates when at least one has scalar reductions of some of its components. There are three possible scenarios: Both the LHS and RHS have to-be-scalarized components, 2) only the RHS has or 3) only the LHS has. In the first case, we would like to load the LHS components from RHS components whenever possible. If that is not possible, we would like to read it directly from the RHS (after updating it by storing in it its own components). If there are some necessary unscalarized data in the LHS, those will be loaded by the original assignment too. If neither of these cases happen, the original statement can be removed. Most of this is done by load_assign_lhs_subreplacements. In the second case, we would like to store all RHS scalarized components directly into LHS and if they cover the aggregate completely, remove the statement too. In the third case, we want the LHS components to be loaded directly from the RHS (DSE will remove the original statement if it becomes redundant). This is a bit complex but manageable when types match and when unions do not cause confusion in a way that we cannot really load a component of LHS from the RHS or vice versa (the access representing this level can have subaccesses that are accessible only through a different union field at a higher level - different from the one used in the examined expression). Unions are fun. Therefore, I specially handle a fourth case, happening when there is a specific type cast or it is impossible to locate a scalarized subaccess on the other side of the expression. If that happens, I simply "refresh" the RHS by storing in it is scalarized components leave the original statement there to do the copying and then load the scalar replacements of the LHS. This is what the first branch does. */ if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) || (access_has_children_p (racc) && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) || (access_has_children_p (lacc) && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) { if (access_has_children_p (racc)) generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0, gsi, false, false); if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0, gsi, true, true); } else { if (access_has_children_p (lacc) && access_has_children_p (racc)) { gimple_stmt_iterator orig_gsi = *gsi; bool refreshed; if (lacc->grp_read && !lacc->grp_covered) { handle_unscalarized_data_in_subtree (racc, lhs, gsi); refreshed = true; } else refreshed = false; load_assign_lhs_subreplacements (lacc->first_child, racc, lacc->offset, racc->offset, &orig_gsi, gsi, &refreshed, lhs); if (!refreshed || !racc->grp_unscalarized_data) { if (*stmt == gsi_stmt (*gsi)) gsi_next (gsi); unlink_stmt_vdef (*stmt); gsi_remove (&orig_gsi, true); return SRA_SA_REMOVED; } } else { if (access_has_children_p (racc)) { if (!racc->grp_unscalarized_data) { generate_subtree_copies (racc->first_child, lhs, racc->offset, 0, 0, gsi, false, false); gcc_assert (*stmt == gsi_stmt (*gsi)); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); return SRA_SA_REMOVED; } else generate_subtree_copies (racc->first_child, lhs, racc->offset, 0, 0, gsi, false, true); } else if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, rhs, lacc->offset, 0, 0, gsi, true, true); } } return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } /* Generate statements initializing scalar replacements of parts of function parameters. */ static void initialize_parameter_reductions (void) { gimple_stmt_iterator gsi; gimple_seq seq = NULL; tree parm; for (parm = DECL_ARGUMENTS (current_function_decl); parm; parm = TREE_CHAIN (parm)) { VEC (access_p, heap) *access_vec; struct access *access; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm))) continue; access_vec = get_base_access_vector (parm); if (!access_vec) continue; if (!seq) { seq = gimple_seq_alloc (); gsi = gsi_start (seq); } for (access = VEC_index (access_p, access_vec, 0); access; access = access->next_grp) generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true); } if (seq) gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq); } /* The "main" function of intraprocedural SRA passes. Runs the analysis and if it reveals there are components of some aggregates to be scalarized, it runs the required transformations. */ static unsigned int perform_intra_sra (void) { int ret = 0; sra_initialize (); if (!find_var_candidates ()) goto out; if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL, true, NULL)) goto out; if (!analyze_all_variable_accesses ()) goto out; scan_function (sra_modify_expr, sra_modify_assign, NULL, false, NULL); initialize_parameter_reductions (); ret = TODO_update_ssa; out: sra_deinitialize (); return ret; } /* Perform early intraprocedural SRA. */ static unsigned int early_intra_sra (void) { sra_mode = SRA_MODE_EARLY_INTRA; return perform_intra_sra (); } /* Perform "late" intraprocedural SRA. */ static unsigned int late_intra_sra (void) { sra_mode = SRA_MODE_INTRA; return perform_intra_sra (); } static bool gate_intra_sra (void) { return flag_tree_sra != 0; } struct gimple_opt_pass pass_sra_early = { { GIMPLE_PASS, "esra", /* name */ gate_intra_sra, /* gate */ early_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa | TODO_ggc_collect | TODO_verify_ssa /* todo_flags_finish */ } }; struct gimple_opt_pass pass_sra = { { GIMPLE_PASS, "sra", /* name */ gate_intra_sra, /* gate */ late_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ TODO_update_address_taken, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa | TODO_ggc_collect | TODO_verify_ssa /* todo_flags_finish */ } }; ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-04-28 10:14 ` [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates Martin Jambor 2009-04-28 10:27 ` Martin Jambor @ 2009-04-29 10:59 ` Richard Guenther 2009-04-29 12:16 ` Martin Jambor 1 sibling, 1 reply; 25+ messages in thread From: Richard Guenther @ 2009-04-29 10:59 UTC (permalink / raw) To: Martin Jambor; +Cc: GCC Patches, Jan Hubicka On Tue, 28 Apr 2009, Martin Jambor wrote: > This is the new intraprocedural SRA. I have stripped off the > interprocedural part and will propose to commit it separately later. > I have tried to remove almost every trace of IPA-SRA, however, two > provisions for it have remained in the patch. First, an enumeration > (rather than a boolean) is used to distuinguish between "early" and > "late" SRA so that other SRA modes can be added later on. Second, > scan_function() has a hook parameter and a void pointer parameter > which are not used in this patch but will be by IPA-SRA. > > Otherwise, the patch is hopefully self-contained and the bases of its > operation is described by the initial comment. > > The patch bootstraps (on x86_64-linux-gnu but I am about to try it on > hppa-linux-gnu too) but produces a small number of testsuite failures > which are handled by the two following patches. > > Thanks, > > Martin > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > * tree-sra.c (enum sra_mode): The whole contents of the file was > replaced. That looks odd ;) I would make it * tree-sra.c: New implementation of SRA. and in the usual way list any exported functions that were removed and list new exported functions that were added. Can you instead of posting a diff post the new contents? That would be way easier to review. Thanks, Richard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates. 2009-04-29 10:59 ` Richard Guenther @ 2009-04-29 12:16 ` Martin Jambor 0 siblings, 0 replies; 25+ messages in thread From: Martin Jambor @ 2009-04-29 12:16 UTC (permalink / raw) To: Richard Guenther; +Cc: GCC Patches, Jan Hubicka Hi, On Wed, Apr 29, 2009 at 12:27:10PM +0200, Richard Guenther wrote: > On Tue, 28 Apr 2009, Martin Jambor wrote: > > > This is the new intraprocedural SRA. I have stripped off the > > interprocedural part and will propose to commit it separately later. > > I have tried to remove almost every trace of IPA-SRA, however, two > > provisions for it have remained in the patch. First, an enumeration > > (rather than a boolean) is used to distuinguish between "early" and > > "late" SRA so that other SRA modes can be added later on. Second, > > scan_function() has a hook parameter and a void pointer parameter > > which are not used in this patch but will be by IPA-SRA. > > > > Otherwise, the patch is hopefully self-contained and the bases of its > > operation is described by the initial comment. > > > > The patch bootstraps (on x86_64-linux-gnu but I am about to try it on > > hppa-linux-gnu too) but produces a small number of testsuite failures > > which are handled by the two following patches. > > > > Thanks, > > > > Martin > > > > > > 2009-04-27 Martin Jambor <mjambor@suse.cz> > > > > * tree-sra.c (enum sra_mode): The whole contents of the file was > > replaced. > > That looks odd ;) I would make it > > * tree-sra.c: New implementation of SRA. > > and in the usual way list any exported functions that were removed and > list new exported functions that were added. OK. I will use that entry. There are no new exported functions, only two exported gimple_opt_pass structures (which are really the only part of the old file which was not really replaced but only changed). Old exported functions are made static by the first patch in the series. > > Can you instead of posting a diff post the new contents? That would > be way easier to review. I've alreasy done that (http://gcc.gnu.org/ml/gcc-patches/2009-04/msg02218.html). However, when I was writing an email to Honza about the last ICE I fixed I realized that two comparisons had the opposite operators than I originally meant and that could lead to some pessimizations. Thinking about it more, I have also realized that I could be bolder and propagate subaccess among accesses of different sizes if I limit the propagated children so that they are not outside of the destination size. The new version (already bootstrapped and tested) is below. There are only minor changes in functions build_accesses_from_assign() and propagate_subacesses_accross_link(). I'm about to patch the automatic tester now. Thanks, Martin /* Scalar Replacement of Aggregates (SRA) converts some structure references into scalar references, exposing them to the scalar optimizers. Copyright (C) 2008, 2009 Free Software Foundation, Inc. Contributed by Martin Jambor <mjambor@suse.cz> This file is part of GCC. GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ /* This file implements Scalar Reduction of Aggregates (SRA). SRA is run twice, once in the early stages of compilation (early SRA) and once in the late stages (late SRA). The aim of both is to turn references to scalar parts of aggregates into uses of independent scalar variables. The two passes are nearly identical, the only difference is that early SRA does not scalarize unions which are used as the result in a GIMPLE_RETURN statement because together with inlining this can lead to weird type conversions. Both passes operate in four stages: 1. The declarations that have properties which make them candidates for scalarization are identified in function find_var_candidates(). The candidates are stored in candidate_bitmap. 2. The function body is scanned. In the process, declarations which are used in a manner that prevent their scalarization are removed from the candidate bitmap. More importantly, for every access into an aggregate, an access structure (struct access) is created by create_access() and stored in a vector associated with the aggregate. Among other information, the aggregate declaration, the offset and size of the access and its type are stored in the structure. On a related note, assign_link structures are created for every assign statement between candidate aggregates and attached to the related accesses. 3. The vectors of accesses are analyzed. They are first sorted according to their offset and size and then scanned for partially overlapping accesses (i.e. those which overlap but one is not entirely within another). Such an access disqualifies the whole aggregate from being scalarized. If there is no such inhibiting overlap, a representative access structure is chosen for every unique combination of offset and size. Afterwards, the pass builds a set of trees from these structures, in which children of an access are within their parent (in terms of offset and size). Then accesses are propagated whenever possible (i.e. in cases when it does not create a partially overlapping access) across assign_links from the right hand side to the left hand side. Then the set of trees for each declaration is traversed again and those accesses which should be replaced by a scalar are identified. 4. The function is traversed again, and for every reference into an aggregate that has some component which is about to be scalarized, statements are amended and new statements are created as necessary. Finally, if a parameter got scalarized, the scalar replacements are initialized with values from respective parameter aggregates. */ #include "config.h" #include "system.h" #include "coretypes.h" #include "alloc-pool.h" #include "tm.h" #include "tree.h" #include "gimple.h" #include "tree-flow.h" #include "diagnostic.h" #include "tree-dump.h" #include "timevar.h" #include "params.h" #include "target.h" #include "flags.h" /* Enumeration of all aggregate reductions we can do. */ enum sra_mode {SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ SRA_MODE_INTRA}; /* late intraprocedural SRA */ /* Global variable describing which aggregate reduction we are performing at the moment. */ static enum sra_mode sra_mode; struct assign_link; /* ACCESS represents each access to an aggregate variable (as a whole or a part). It can also represent a group of accesses that refer to exactly the same fragment of an aggregate (i.e. those that have exactly the same offset and size). Such representatives for a single aggregate, once determined, are linked in a linked list and have the group fields set. Moreover, when doing intraprocedural SRA, a tree is built from those representatives (by the means of first_child and next_sibling pointers), in which all items in a subtree are "within" the root, i.e. their offset is greater or equal to offset of the root and offset+size is smaller or equal to offset+size of the root. Children of an access are sorted by offset. */ struct access { /* Values returned by `get_ref_base_and_extent' for each COMPONENT_REF If EXPR isn't a COMPONENT_REF just set `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ HOST_WIDE_INT offset; HOST_WIDE_INT size; tree base; /* Expression. */ tree expr; /* Type. */ tree type; /* Next group representative for this aggregate. */ struct access *next_grp; /* Pointer to the group representative. Pointer to itself if the struct is the representative. */ struct access *group_representative; /* If this access has any children (in terms of the definition above), this points to the first one. */ struct access *first_child; /* Pointer to the next sibling in the access tree as described above. */ struct access *next_sibling; /* Pointers to the first and last element in the linked list of assign links. */ struct assign_link *first_link, *last_link; /* Pointer to the next access in the work queue. */ struct access *next_queued; /* Replacement variable for this access "region." Never to be accessed directly, always only by the means of get_access_replacement() and only when grp_to_be_replaced flag is set. */ tree replacement_decl; /* Is this particular access write access? */ unsigned write : 1; /* Is this access currently in the work queue? */ unsigned grp_queued : 1; /* Does this group contain a write access? This flag is propagated down the access tree. */ unsigned grp_write : 1; /* Does this group contain a read access? This flag is propagated down the access tree. */ unsigned grp_read : 1; /* Is the subtree rooted in this access fully covered by scalar replacements? */ unsigned grp_covered : 1; /* If set to true, this access and all below it in an access tree must not be scalarized. */ unsigned grp_unscalarizable_region : 1; /* Whether data have been written to parts of the aggregate covered by this access which is not to be scalarized. This flag is propagated up in the access tree. */ unsigned grp_unscalarized_data : 1; /* Does this access and/or group contain a write access through a BIT_FIELD_REF? */ unsigned grp_bfr_lhs : 1; /* Set when a scalar replacement should be created for this variable. We do the decision and creation at different places because create_tmp_var cannot be called from within FOR_EACH_REFERENCED_VAR. */ unsigned grp_to_be_replaced : 1; }; typedef struct access *access_p; DEF_VEC_P (access_p); DEF_VEC_ALLOC_P (access_p, heap); /* Alloc pool for allocating access structures. */ static alloc_pool access_pool; /* A structure linking lhs and rhs accesses from an aggregate assignment. They are used to propagate subaccesses from rhs to lhs as long as they don't conflict with what is already there. */ struct assign_link { struct access *lacc, *racc; struct assign_link *next; }; /* Alloc pool for allocating assign link structures. */ static alloc_pool link_pool; /* Base (tree) -> Vector (VEC(access_p,heap) *) map. */ static struct pointer_map_t *base_access_vec; /* Bitmap of bases (candidates). */ static bitmap candidate_bitmap; /* Bitmap of declarations used in a return statement. */ static bitmap retvals_bitmap; /* Obstack for creation of fancy names. */ static struct obstack name_obstack; /* Head of a linked list of accesses that need to have its subaccesses propagated to their assignment counterparts. */ static struct access *work_queue_head; /* Dump contents of ACCESS to file F in a human friendly way. If GRP is true, representative fields are dumped, otherwise those which only describe the individual access are. */ static void dump_access (FILE *f, struct access *access, bool grp) { fprintf (f, "access { "); fprintf (f, "base = (%d)'", DECL_UID (access->base)); print_generic_expr (f, access->base, 0); fprintf (f, "', offset = %d", (int) access->offset); fprintf (f, ", size = %d", (int) access->size); fprintf (f, ", expr = "); print_generic_expr (f, access->expr, 0); fprintf (f, ", type = "); print_generic_expr (f, access->type, 0); if (grp) fprintf (f, ", grp_write = %d, grp_read = %d, grp_covered = %d, " "grp_unscalarizable_region = %d, grp_unscalarized_data = %d, " "grp_to_be_replaced = %d\n", access->grp_write, access->grp_read, access->grp_covered, access->grp_unscalarizable_region, access->grp_unscalarized_data, access->grp_to_be_replaced); else fprintf (f, ", write = %d'\n", access->write); } /* Dump a subtree rooted in ACCESS to file F, indent by LEVEL. */ static void dump_access_tree_1 (FILE *f, struct access *access, int level) { do { int i; for (i = 0; i < level; i++) fputs ("* ", dump_file); dump_access (f, access, true); if (access->first_child) dump_access_tree_1 (f, access->first_child, level + 1); access = access->next_sibling; } while (access); } /* Dump all access trees for a variable, given the pointer to the first root in ACCESS. */ static void dump_access_tree (FILE *f, struct access *access) { for (; access; access = access->next_grp) dump_access_tree_1 (f, access, 0); } /* Return a vector of pointers to accesses for the variable given in BASE or NULL if there is none. */ static VEC (access_p, heap) * get_base_access_vector (tree base) { void **slot; slot = pointer_map_contains (base_access_vec, base); if (!slot) return NULL; else return *(VEC (access_p, heap) **) slot; } /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted in ACCESS. Return NULL if it cannot be found. */ static struct access * find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, HOST_WIDE_INT size) { while (access && (access->offset != offset || access->size != size)) { struct access *child = access->first_child; while (child && (child->offset + child->size <= offset)) child = child->next_sibling; access = child; } return access; } /* Return the first group representative for DECL or NULL if none exists. */ static struct access * get_first_repr_for_decl (tree base) { VEC (access_p, heap) *access_vec; access_vec = get_base_access_vector (base); if (!access_vec) return NULL; return VEC_index (access_p, access_vec, 0); } /* Find an access representative for the variable BASE and given OFFSET and SIZE. Requires that access trees have already been built. Return NULL if it cannot be found. */ static struct access * get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size) { struct access *access; access = get_first_repr_for_decl (base); while (access && (access->offset + access->size <= offset)) access = access->next_grp; if (!access) return NULL; return find_access_in_subtree (access, offset, size); } /* Add LINK to the linked list of assign links of RACC. */ static void add_link_to_rhs (struct access *racc, struct assign_link *link) { gcc_assert (link->racc == racc); if (!racc->first_link) { gcc_assert (!racc->last_link); racc->first_link = link; } else racc->last_link->next = link; racc->last_link = link; link->next = NULL; } /* Move all link structures in their linked list in OLD_RACC to the linked list in NEW_RACC. */ static void relink_to_new_repr (struct access *new_racc, struct access *old_racc) { if (!old_racc->first_link) { gcc_assert (!old_racc->last_link); return; } if (new_racc->first_link) { gcc_assert (!new_racc->last_link->next); gcc_assert (!old_racc->last_link || !old_racc->last_link->next); new_racc->last_link->next = old_racc->first_link; new_racc->last_link = old_racc->last_link; } else { gcc_assert (!new_racc->last_link); new_racc->first_link = old_racc->first_link; new_racc->last_link = old_racc->last_link; } old_racc->first_link = old_racc->last_link = NULL; } /* Add ACCESS to the work queue (which is actually a stack). */ static void add_access_to_work_queue (struct access *access) { if (!access->grp_queued) { gcc_assert (!access->next_queued); access->next_queued = work_queue_head; access->grp_queued = 1; work_queue_head = access; } } /* Pop an access from the work queue, and return it, assuming there is one. */ static struct access * pop_access_from_work_queue (void) { struct access *access = work_queue_head; work_queue_head = access->next_queued; access->next_queued = NULL; access->grp_queued = 0; return access; } /* Allocate necessary structures. */ static void sra_initialize (void) { candidate_bitmap = BITMAP_ALLOC (NULL); retvals_bitmap = BITMAP_ALLOC (NULL); gcc_obstack_init (&name_obstack); access_pool = create_alloc_pool ("SRA accesses", sizeof (struct access), 16); link_pool = create_alloc_pool ("SRA links", sizeof (struct assign_link), 16); base_access_vec = pointer_map_create (); } /* Hook fed to pointer_map_traverse, deallocate stored vectors. */ static bool delete_base_accesses (const void *key ATTRIBUTE_UNUSED, void **value, void *data ATTRIBUTE_UNUSED) { VEC (access_p, heap) *access_vec; access_vec = (VEC (access_p, heap) *) *value; VEC_free (access_p, heap, access_vec); return true; } /* Deallocate all general structures. */ static void sra_deinitialize (void) { BITMAP_FREE (candidate_bitmap); BITMAP_FREE (retvals_bitmap); free_alloc_pool (access_pool); free_alloc_pool (link_pool); obstack_free (&name_obstack, NULL); pointer_map_traverse (base_access_vec, delete_base_accesses, NULL); pointer_map_destroy (base_access_vec); } /* Remove DECL from candidates for SRA and write REASON to the dump file if there is one. */ static void disqualify_candidate (tree decl, const char *reason) { bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)); if (dump_file) { fprintf (dump_file, "! Disqualifying "); print_generic_expr (dump_file, decl, 0); fprintf (dump_file, " - %s\n", reason); } } /* Return true iff the type contains a field or an element which does not allow scalarization. */ static bool type_internals_preclude_sra_p (tree type) { tree fld; tree et; switch (TREE_CODE (type)) { case RECORD_TYPE: case UNION_TYPE: case QUAL_UNION_TYPE: for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) if (TREE_CODE (fld) == FIELD_DECL) { tree ft = TREE_TYPE (fld); if (TREE_THIS_VOLATILE (fld) || !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld) || !host_integerp (DECL_FIELD_OFFSET (fld), 1) || !host_integerp (DECL_SIZE (fld), 1)) return true; if (AGGREGATE_TYPE_P (ft) && type_internals_preclude_sra_p (ft)) return true; } return false; case ARRAY_TYPE: et = TREE_TYPE (type); if (AGGREGATE_TYPE_P (et)) return type_internals_preclude_sra_p (et); else return false; default: return false; } } /* Create and insert access for EXPR. Return created access, or NULL if it is not possible. */ static struct access * create_access (tree expr, bool write) { struct access *access; void **slot; VEC (access_p,heap) *vec; HOST_WIDE_INT offset, size, max_size; tree base = expr; bool unscalarizable_region = false; if (handled_component_p (expr)) base = get_ref_base_and_extent (expr, &offset, &size, &max_size); else { tree tree_size; tree_size = TYPE_SIZE (TREE_TYPE (base)); if (tree_size && host_integerp (tree_size, 1)) size = max_size = tree_low_cst (tree_size, 1); else size = max_size = -1; offset = 0; } if (!base || !DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; if (size != max_size) { size = max_size; unscalarizable_region = true; } if (size < 0) { disqualify_candidate (base, "Encountered an ultra variable sized " "access."); return NULL; } access = (struct access *) pool_alloc (access_pool); memset (access, 0, sizeof (struct access)); access->base = base; access->offset = offset; access->size = size; access->expr = expr; access->type = TREE_TYPE (expr); access->write = write; access->grp_unscalarizable_region = unscalarizable_region; slot = pointer_map_contains (base_access_vec, base); if (slot) vec = (VEC (access_p, heap) *) *slot; else vec = VEC_alloc (access_p, heap, 32); VEC_safe_push (access_p, heap, vec, access); *((struct VEC (access_p,heap) **) pointer_map_insert (base_access_vec, base)) = vec; return access; } /* Callback of walk_tree. Search the given tree for a declaration and exclude it from the candidates. */ static tree disqualify_all (tree *tp, int *walk_subtrees, void *data ATTRIBUTE_UNUSED) { tree base = *tp; if (TREE_CODE (base) == SSA_NAME) base = SSA_NAME_VAR (base); if (DECL_P (base)) { disqualify_candidate (base, "From within disqualify_all()."); *walk_subtrees = 0; } else *walk_subtrees = 1; return NULL_TREE; } /* Scan expression EXPR and create access structures for all accesses to candidates for scalarization. Return the created access or NULL if none is created. */ static struct access * build_access_from_expr_1 (tree *expr_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write) { struct access *ret = NULL; tree expr = *expr_ptr; tree safe_expr = expr; bool bit_ref; if (TREE_CODE (expr) == BIT_FIELD_REF) { expr = TREE_OPERAND (expr, 0); bit_ref = true; } else bit_ref = false; while (TREE_CODE (expr) == NOP_EXPR || TREE_CODE (expr) == VIEW_CONVERT_EXPR || TREE_CODE (expr) == REALPART_EXPR || TREE_CODE (expr) == IMAGPART_EXPR) expr = TREE_OPERAND (expr, 0); switch (TREE_CODE (expr)) { case ADDR_EXPR: case SSA_NAME: case INDIRECT_REF: break; case VAR_DECL: case PARM_DECL: case RESULT_DECL: case COMPONENT_REF: case ARRAY_REF: ret = create_access (expr, write); break; case REALPART_EXPR: case IMAGPART_EXPR: expr = TREE_OPERAND (expr, 0); ret = create_access (expr, write); break; case ARRAY_RANGE_REF: default: walk_tree (&safe_expr, disqualify_all, NULL, NULL); break; } if (write && bit_ref && ret) ret->grp_bfr_lhs = 1; return ret; } /* Scan expression EXPR and create access structures for all accesses to candidates for scalarization. Return true if any access has been inserted. */ static bool build_access_from_expr (tree *expr_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, bool write, void *data ATTRIBUTE_UNUSED) { return build_access_from_expr_1 (expr_ptr, gsi, write) != NULL; } /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in modes in which it matters, return true iff they have been disqualified. RHS may be NULL, in that case ignore it. If we scalarize an aggregate in intra-SRA we may need to add statements after each statement. This is not possible if a statement unconditionally has to end the basic block. */ static bool disqualify_ops_if_throwing_stmt (gimple stmt, tree *lhs, tree *rhs) { if (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)) { walk_tree (lhs, disqualify_all, NULL, NULL); if (rhs) walk_tree (rhs, disqualify_all, NULL, NULL); return true; } return false; } /* Result code for scan_assign callback for scan_function. */ enum scan_assign_result {SRA_SA_NONE, /* nothing done for the stmt */ SRA_SA_PROCESSED, /* stmt analyzed/changed */ SRA_SA_REMOVED}; /* stmt redundant and eliminated */ /* Scan expressions occuring in the statement pointed to by STMT_EXPR, create access structures for all accesses to candidates for scalarization and remove those candidates which occur in statements or expressions that prevent them from being split apart. Return true if any access has been inserted. */ static enum scan_assign_result build_accesses_from_assign (gimple *stmt_ptr, gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { gimple stmt = *stmt_ptr; tree *lhs_ptr, *rhs_ptr; struct access *lacc, *racc; if (gimple_assign_rhs2 (stmt)) return SRA_SA_NONE; lhs_ptr = gimple_assign_lhs_ptr (stmt); rhs_ptr = gimple_assign_rhs1_ptr (stmt); if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr)) return SRA_SA_NONE; racc = build_access_from_expr_1 (rhs_ptr, gsi, false); lacc = build_access_from_expr_1 (lhs_ptr, gsi, true); if (lacc && racc && !lacc->grp_unscalarizable_region && !racc->grp_unscalarizable_region && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr)) && useless_type_conversion_p (lacc->type, racc->type)) { struct assign_link *link; link = (struct assign_link *) pool_alloc (link_pool); memset (link, 0, sizeof (struct assign_link)); link->lacc = lacc; link->racc = racc; add_link_to_rhs (racc, link); } return (lacc || racc) ? SRA_SA_PROCESSED : SRA_SA_NONE; } /* Scan function and look for interesting statements. Return true if any has been found or processed, as indicated by callbacks. SCAN_EXPR is a callback called on all expressions within statements except assign statements and those deemed entirely unsuitable for some reason (all operands in such statements and expression are removed from candidate_bitmap). SCAN_ASSIGN is a callback called on all assign statements, HANDLE_SSA_DEFS is a callback called on assign statements and those call statements which have a lhs and it is the only callback which can be NULL. ANALYSIS_STAGE is true when running in the analysis stage of a pass and thus no statement is being modified. DATA is a pointer passed to all callbacks. If any single callback returns true, this function also returns true, otherwise it returns false. */ static bool scan_function (bool (*scan_expr) (tree *, gimple_stmt_iterator *, bool, void *), enum scan_assign_result (*scan_assign) (gimple *, gimple_stmt_iterator *, void *), bool (*handle_ssa_defs)(gimple, void *), bool analysis_stage, void *data) { gimple_stmt_iterator gsi; basic_block bb; unsigned i; tree *t; bool ret = false; FOR_EACH_BB (bb) { bool bb_changed = false; gsi = gsi_start_bb (bb); while (!gsi_end_p (gsi)) { gimple stmt = gsi_stmt (gsi); enum scan_assign_result assign_result; bool any = false, deleted = false; switch (gimple_code (stmt)) { case GIMPLE_RETURN: t = gimple_return_retval_ptr (stmt); if (*t != NULL_TREE) { if (DECL_P (*t)) { tree ret_type = TREE_TYPE (*t); if (sra_mode == SRA_MODE_EARLY_INTRA && (TREE_CODE (ret_type) == UNION_TYPE || TREE_CODE (ret_type) == QUAL_UNION_TYPE)) disqualify_candidate (*t, "Union in a return statement."); else bitmap_set_bit (retvals_bitmap, DECL_UID (*t)); } any |= scan_expr (t, &gsi, false, data); } break; case GIMPLE_ASSIGN: assign_result = scan_assign (&stmt, &gsi, data); any |= assign_result == SRA_SA_PROCESSED; deleted = assign_result == SRA_SA_REMOVED; if (handle_ssa_defs && assign_result != SRA_SA_REMOVED) any |= handle_ssa_defs (stmt, data); break; case GIMPLE_CALL: /* Operands must be processed before the lhs. */ for (i = 0; i < gimple_call_num_args (stmt); i++) { tree *argp = gimple_call_arg_ptr (stmt, i); any |= scan_expr (argp, &gsi, false, data); } if (gimple_call_lhs (stmt)) { tree *lhs_ptr = gimple_call_lhs_ptr (stmt); if (!analysis_stage || !disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, NULL)) { any |= scan_expr (lhs_ptr, &gsi, true, data); if (handle_ssa_defs) any |= handle_ssa_defs (stmt, data); } } break; case GIMPLE_ASM: for (i = 0; i < gimple_asm_ninputs (stmt); i++) { tree *op = &TREE_VALUE (gimple_asm_input_op (stmt, i)); any |= scan_expr (op, &gsi, false, data); } for (i = 0; i < gimple_asm_noutputs (stmt); i++) { tree *op = &TREE_VALUE (gimple_asm_output_op (stmt, i)); any |= scan_expr (op, &gsi, true, data); } default: if (analysis_stage) walk_gimple_op (stmt, disqualify_all, NULL); break; } if (any) { ret = true; bb_changed = true; if (!analysis_stage) { update_stmt (stmt); if (!stmt_could_throw_p (stmt)) remove_stmt_from_eh_region (stmt); } } if (deleted) bb_changed = true; else { gsi_next (&gsi); ret = true; } } if (!analysis_stage && bb_changed) gimple_purge_dead_eh_edges (bb); } return ret; } /* Helper of QSORT function. There are pointers to accesses in the array. An access is considered smaller than another if it has smaller offset or if the offsets are the same but is size is bigger. */ static int compare_access_positions (const void *a, const void *b) { const access_p *fp1 = (const access_p *) a; const access_p *fp2 = (const access_p *) b; const access_p f1 = *fp1; const access_p f2 = *fp2; if (f1->offset != f2->offset) return f1->offset < f2->offset ? -1 : 1; if (f1->size == f2->size) return 0; /* We want the bigger accesses first, thus the opposite operator in the next line: */ return f1->size > f2->size ? -1 : 1; } /* Append a name of the declaration to the name obstack. A helper function for make_fancy_name. */ static void make_fancy_decl_name (tree decl) { char buffer[32]; tree name = DECL_NAME (decl); if (name) obstack_grow (&name_obstack, IDENTIFIER_POINTER (name), IDENTIFIER_LENGTH (name)); else { sprintf (buffer, "D%u", DECL_UID (decl)); obstack_grow (&name_obstack, buffer, strlen (buffer)); } } /* Helper for make_fancy_name. */ static void make_fancy_name_1 (tree expr) { char buffer[32]; tree index; if (DECL_P (expr)) { make_fancy_decl_name (expr); return; } switch (TREE_CODE (expr)) { case COMPONENT_REF: make_fancy_name_1 (TREE_OPERAND (expr, 0)); obstack_1grow (&name_obstack, '$'); make_fancy_decl_name (TREE_OPERAND (expr, 1)); break; case ARRAY_REF: make_fancy_name_1 (TREE_OPERAND (expr, 0)); obstack_1grow (&name_obstack, '$'); /* Arrays with only one element may not have a constant as their index. */ index = TREE_OPERAND (expr, 1); if (TREE_CODE (index) != INTEGER_CST) break; sprintf (buffer, HOST_WIDE_INT_PRINT_DEC, TREE_INT_CST_LOW (index)); obstack_grow (&name_obstack, buffer, strlen (buffer)); break; case BIT_FIELD_REF: case REALPART_EXPR: case IMAGPART_EXPR: gcc_unreachable (); /* we treat these as scalars. */ break; default: break; } } /* Create a human readable name for replacement variable of ACCESS. */ static char * make_fancy_name (tree expr) { make_fancy_name_1 (expr); obstack_1grow (&name_obstack, '\0'); return XOBFINISH (&name_obstack, char *); } /* Helper function for build_ref_for_offset. */ static bool build_ref_for_offset_1 (tree *res, tree type, HOST_WIDE_INT offset, tree exp_type) { while (1) { tree fld; tree tr_size, index; HOST_WIDE_INT el_size; if (offset == 0 && exp_type && useless_type_conversion_p (exp_type, type)) return true; switch (TREE_CODE (type)) { case UNION_TYPE: case QUAL_UNION_TYPE: case RECORD_TYPE: /* Some ADA records are half-unions, treat all of them the same. */ for (fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld)) { HOST_WIDE_INT pos, size; tree expr, *expr_ptr; if (TREE_CODE (fld) != FIELD_DECL) continue; pos = int_bit_position (fld); gcc_assert (TREE_CODE (type) == RECORD_TYPE || pos == 0); size = tree_low_cst (DECL_SIZE (fld), 1); if (pos > offset || (pos + size) <= offset) continue; if (res) { expr = build3 (COMPONENT_REF, TREE_TYPE (fld), *res, fld, NULL_TREE); expr_ptr = &expr; } else expr_ptr = NULL; if (build_ref_for_offset_1 (expr_ptr, TREE_TYPE (fld), offset - pos, exp_type)) { if (res) *res = expr; return true; } } return false; case ARRAY_TYPE: tr_size = TYPE_SIZE (TREE_TYPE (type)); if (!tr_size || !host_integerp (tr_size, 1)) return false; el_size = tree_low_cst (tr_size, 1); index = build_int_cst (TYPE_DOMAIN (type), offset / el_size); if (!integer_zerop (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) index = int_const_binop (PLUS_EXPR, index, TYPE_MIN_VALUE (TYPE_DOMAIN (type)), 0); if (res) *res = build4 (ARRAY_REF, TREE_TYPE (type), *res, index, NULL_TREE, NULL_TREE); offset = offset % el_size; type = TREE_TYPE (type); break; default: if (offset != 0) return false; if (exp_type) return false; else return true; } } } /* Construct an expression that would reference a part of aggregate *EXPR of type TYPE at the given OFFSET of the type EXP_TYPE. If EXPR is NULL, the function only determines whether it can build such a reference without actually doing it. FIXME: Eventually this should be replaced with maybe_fold_offset_to_reference() from tree-ssa-ccp.c but that requires a minor rewrite of fold_stmt. */ static bool build_ref_for_offset (tree *expr, tree type, HOST_WIDE_INT offset, tree exp_type, bool allow_ptr) { if (allow_ptr && POINTER_TYPE_P (type)) { type = TREE_TYPE (type); if (expr) *expr = fold_build1 (INDIRECT_REF, type, *expr); } return build_ref_for_offset_1 (expr, type, offset, exp_type); } /* The very first phase of intraprocedural SRA. It marks in candidate_bitmap those with type which is suitable for scalarization. */ static bool find_var_candidates (void) { tree var, type; referenced_var_iterator rvi; bool ret = false; FOR_EACH_REFERENCED_VAR (var, rvi) { if (TREE_CODE (var) != VAR_DECL && TREE_CODE (var) != PARM_DECL) continue; type = TREE_TYPE (var); if (!AGGREGATE_TYPE_P (type) || needs_to_live_in_memory (var) || TREE_THIS_VOLATILE (var) || !COMPLETE_TYPE_P (type) || !host_integerp (TYPE_SIZE (type), 1) || tree_low_cst (TYPE_SIZE (type), 1) == 0 || type_internals_preclude_sra_p (type)) continue; bitmap_set_bit (candidate_bitmap, DECL_UID (var)); if (dump_file) { fprintf (dump_file, "Candidate (%d): ", DECL_UID (var)); print_generic_expr (dump_file, var, 0); fprintf (dump_file, "\n"); } ret = true; } return ret; } /* Return true if TYPE should be considered a scalar type by SRA. */ static bool is_sra_scalar_type (tree type) { enum tree_code code = TREE_CODE (type); return (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type) || FIXED_POINT_TYPE_P (type) || POINTER_TYPE_P (type) || code == VECTOR_TYPE || code == COMPLEX_TYPE || code == OFFSET_TYPE); } /* Sort all accesses for the given variable, check for partial overlaps and return NULL if there are any. If there are none, pick a representative for each combination of offset and size and create a linked list out of them. Return the pointer to the first representative and make sure it is the first one in the vector of accesses. */ static struct access * sort_and_splice_var_accesses (tree var) { int i, j, access_count; struct access *res, **prev_acc_ptr = &res; VEC (access_p, heap) *access_vec; bool first = true; HOST_WIDE_INT low = -1, high = 0; access_vec = get_base_access_vector (var); if (!access_vec) return NULL; access_count = VEC_length (access_p, access_vec); /* Sort by <OFFSET, SIZE>. */ qsort (VEC_address (access_p, access_vec), access_count, sizeof (access_p), compare_access_positions); i = 0; while (i < access_count) { struct access *access = VEC_index (access_p, access_vec, i); bool modification = access->write; bool grp_read = !access->write; bool grp_bfr_lhs = access->grp_bfr_lhs; bool first_scalar = is_sra_scalar_type (access->type); bool unscalarizable_region = access->grp_unscalarizable_region; if (first || access->offset >= high) { first = false; low = access->offset; high = access->offset + access->size; } else if (access->offset > low && access->offset + access->size > high) return NULL; else gcc_assert (access->offset >= low && access->offset + access->size <= high); j = i + 1; while (j < access_count) { struct access *ac2 = VEC_index (access_p, access_vec, j); if (ac2->offset != access->offset || ac2->size != access->size) break; modification |= ac2->write; grp_read |= !ac2->write; grp_bfr_lhs |= ac2->grp_bfr_lhs; unscalarizable_region |= ac2->grp_unscalarizable_region; relink_to_new_repr (access, ac2); /* If one of the equivalent accesses is scalar, use it as a representative (this happens when when there is for example on a single scalar field in a structure). */ if (!first_scalar && is_sra_scalar_type (ac2->type)) { struct access tmp_acc; first_scalar = true; memcpy (&tmp_acc, ac2, sizeof (struct access)); memcpy (ac2, access, sizeof (struct access)); memcpy (access, &tmp_acc, sizeof (struct access)); } ac2->group_representative = access; j++; } i = j; access->group_representative = access; access->grp_write = modification; access->grp_read = grp_read; access->grp_bfr_lhs = grp_bfr_lhs; access->grp_unscalarizable_region = unscalarizable_region; if (access->first_link) add_access_to_work_queue (access); *prev_acc_ptr = access; prev_acc_ptr = &access->next_grp; } gcc_assert (res == VEC_index (access_p, access_vec, 0)); return res; } /* Create a variable for the given ACCESS which determines the type, name and a few other properties. Return the variable declaration and store it also to ACCESS->replacement. */ static tree create_access_replacement (struct access *access) { tree repl; repl = make_rename_temp (access->type, "SR"); get_var_ann (repl); add_referenced_var (repl); DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base); DECL_ARTIFICIAL (repl) = 1; if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base)) { char *pretty_name = make_fancy_name (access->expr); DECL_NAME (repl) = get_identifier (pretty_name); obstack_free (&name_obstack, pretty_name); SET_DECL_DEBUG_EXPR (repl, access->expr); DECL_DEBUG_EXPR_IS_FROM (repl) = 1; DECL_IGNORED_P (repl) = 0; TREE_NO_WARNING (repl) = TREE_NO_WARNING (access->base); } else { DECL_IGNORED_P (repl) = 1; TREE_NO_WARNING (repl) = 1; } if (access->grp_bfr_lhs) DECL_GIMPLE_REG_P (repl) = 0; if (dump_file) { fprintf (dump_file, "Created a replacement for "); print_generic_expr (dump_file, access->base, 0); fprintf (dump_file, " offset: %u, size: %u: ", (unsigned) access->offset, (unsigned) access->size); print_generic_expr (dump_file, repl, 0); fprintf (dump_file, "\n"); } return repl; } /* Return ACCESS scalar replacement, create it if it does not exist yet. */ static inline tree get_access_replacement (struct access *access) { gcc_assert (access->grp_to_be_replaced); if (access->replacement_decl) return access->replacement_decl; access->replacement_decl = create_access_replacement (access); return access->replacement_decl; } /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the linked list along the way. Stop when *ACCESS is NULL or the access pointed to it is not "within" the root. */ static void build_access_subtree (struct access **access) { struct access *root = *access, *last_child = NULL; HOST_WIDE_INT limit = root->offset + root->size; *access = (*access)->next_grp; while (*access && (*access)->offset + (*access)->size <= limit) { if (!last_child) root->first_child = *access; else last_child->next_sibling = *access; last_child = *access; build_access_subtree (access); } } /* Build a tree of access representatives, ACCESS is the pointer to the first one, others are linked in a list by the next_grp field. Decide about scalar replacements on the way, return true iff any are to be created. */ static void build_access_trees (struct access *access) { while (access) { struct access *root = access; build_access_subtree (&access); root->next_grp = access; } } /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when both seeming beneficial and when ALLOW_REPLACEMENTS allows it. Also set all sorts of access flags appropriately along the way, notably always ser grp_read when MARK_READ is true and grp_write when MARK_WRITE is true. */ static bool analyze_access_subtree (struct access *root, bool allow_replacements, bool mark_read, bool mark_write) { struct access *child; HOST_WIDE_INT limit = root->offset + root->size; HOST_WIDE_INT covered_to = root->offset; bool scalar = is_sra_scalar_type (root->type); bool hole = false, sth_created = false; if (mark_read) root->grp_read = true; else if (root->grp_read) mark_read = true; if (mark_write) root->grp_write = true; else if (root->grp_write) mark_write = true; if (root->grp_unscalarizable_region) allow_replacements = false; for (child = root->first_child; child; child = child->next_sibling) { if (!hole && child->offset < covered_to) hole = true; else covered_to += child->size; sth_created |= analyze_access_subtree (child, allow_replacements && !scalar, mark_read, mark_write); root->grp_unscalarized_data |= child->grp_unscalarized_data; hole |= !child->grp_covered; } if (allow_replacements && scalar && !root->first_child) { if (dump_file) { fprintf (dump_file, "Marking "); print_generic_expr (dump_file, root->base, 0); fprintf (dump_file, " offset: %u, size: %u: ", (unsigned) root->offset, (unsigned) root->size); fprintf (dump_file, " to be replaced.\n"); } root->grp_to_be_replaced = 1; sth_created = true; hole = false; } else if (covered_to < limit) hole = true; if (sth_created && !hole) { root->grp_covered = 1; return true; } if (root->grp_write || TREE_CODE (root->base) == PARM_DECL) root->grp_unscalarized_data = 1; /* not covered and written to */ if (sth_created) return true; return false; } /* Analyze all access trees linked by next_grp by the means of analyze_access_subtree. */ static bool analyze_access_trees (struct access *access) { bool ret = false; while (access) { if (analyze_access_subtree (access, true, false, false)) ret = true; access = access->next_grp; } return ret; } /* Return true iff a potential new child of LACC at offset OFFSET and with size SIZE would conflict with an already existing one. If exactly such a child already exists in LACC, store a pointer to it in EXACT_MATCH. */ static bool child_would_conflict_in_lacc (struct access *lacc, HOST_WIDE_INT norm_offset, HOST_WIDE_INT size, struct access **exact_match) { struct access *child; for (child = lacc->first_child; child; child = child->next_sibling) { if (child->offset == norm_offset && child->size == size) { *exact_match = child; return true; } if (child->offset < norm_offset + size && child->offset + child->size > norm_offset) return true; } return false; } /* Set the expr of TARGET to one just like MODEL but with is own base at the bottom of the handled components. */ static void duplicate_expr_for_different_base (struct access *target, struct access *model) { tree t, expr = unshare_expr (model->expr); gcc_assert (handled_component_p (expr)); t = expr; while (handled_component_p (TREE_OPERAND (t, 0))) t = TREE_OPERAND (t, 0); gcc_assert (TREE_OPERAND (t, 0) == model->base); TREE_OPERAND (t, 0) = target->base; target->expr = expr; } /* Create a new child access of PARENT, with all properties just like MODEL except for its offset and with its grp_write false and grp_read true. Return the new access. Note that this access is created long after all splicing and sorting, it's not located in any access vector and is automatically a representative of its group. */ static struct access * create_artificial_child_access (struct access *parent, struct access *model, HOST_WIDE_INT new_offset) { struct access *access; struct access **child; gcc_assert (!model->grp_unscalarizable_region); access = (struct access *) pool_alloc (access_pool); memset (access, 0, sizeof (struct access)); access->base = parent->base; access->offset = new_offset; access->size = model->size; duplicate_expr_for_different_base (access, model); access->type = model->type; access->grp_write = true; access->grp_read = false; child = &parent->first_child; while (*child && (*child)->offset < new_offset) child = &(*child)->next_sibling; access->next_sibling = *child; *child = access; return access; } /* Propagate all subaccesses of RACC across an assignment link to LACC. Return true if any new subaccess was created. Additionally, if RACC is a scalar access but LACC is not, change the type of the latter. */ static bool propagate_subacesses_accross_link (struct access *lacc, struct access *racc) { struct access *rchild; HOST_WIDE_INT norm_delta = lacc->offset - racc->offset; HOST_WIDE_INT lbase_size = tree_low_cst (DECL_SIZE (lacc->base), 1); bool ret = false; if (is_sra_scalar_type (lacc->type) || lacc->grp_unscalarizable_region || racc->grp_unscalarizable_region) return false; if (!lacc->first_child && !racc->first_child && is_sra_scalar_type (racc->type) && (sra_mode == SRA_MODE_INTRA || !bitmap_bit_p (retvals_bitmap, DECL_UID (lacc->base)))) { duplicate_expr_for_different_base (lacc, racc); lacc->type = racc->type; return false; } for (rchild = racc->first_child; rchild; rchild = rchild->next_sibling) { struct access *new_acc = NULL; HOST_WIDE_INT norm_offset = rchild->offset + norm_delta; if (rchild->grp_unscalarizable_region || norm_offset + rchild->size > lbase_size) continue; if (child_would_conflict_in_lacc (lacc, norm_offset, rchild->size, &new_acc)) { if (new_acc && rchild->first_child) ret |= propagate_subacesses_accross_link (new_acc, rchild); continue; } new_acc = create_artificial_child_access (lacc, rchild, norm_offset); if (racc->first_child) propagate_subacesses_accross_link (new_acc, rchild); ret = true; } return ret; } /* Propagate all subaccesses across assignment links. */ static void propagate_all_subaccesses (void) { while (work_queue_head) { struct access *racc = pop_access_from_work_queue (); struct assign_link *link; gcc_assert (racc->first_link); for (link = racc->first_link; link; link = link->next) { struct access *lacc = link->lacc; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (lacc->base))) continue; lacc = lacc->group_representative; if (propagate_subacesses_accross_link (lacc, racc) && lacc->first_link) add_access_to_work_queue (lacc); } } } /* Go through all accesses collected throughout the (intraprocedural) analysis stage, exclude overlapping ones, identify representatives and build trees out of them, making decisions about scalarization on the way. Return true iff there are any to-be-scalarized variables after this stage. */ static bool analyze_all_variable_accesses (void) { tree var; referenced_var_iterator rvi; bool res = false; FOR_EACH_REFERENCED_VAR (var, rvi) if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) { struct access *access; access = sort_and_splice_var_accesses (var); if (access) build_access_trees (access); else disqualify_candidate (var, "No or inhibitingly overlapping accesses."); } propagate_all_subaccesses (); FOR_EACH_REFERENCED_VAR (var, rvi) if (bitmap_bit_p (candidate_bitmap, DECL_UID (var))) { struct access *access = get_first_repr_for_decl (var); if (analyze_access_trees (access)) { res = true; if (dump_file) { fprintf (dump_file, "\nAccess trees for "); print_generic_expr (dump_file, var, 0); fprintf (dump_file, " (UID: %u): \n", DECL_UID (var)); dump_access_tree (dump_file, access); fprintf (dump_file, "\n"); } } else disqualify_candidate (var, "No scalar replacements to be created."); } return res; } /* Return true iff a reference statement into aggregate AGG can be built for every single to-be-replaced accesses that is a child of ACCESS, its sibling or a child of its sibling. TOP_OFFSET is the offset from the processed access subtree that has to be subtracted from offset of each access. */ static bool ref_expr_for_all_replacements_p (struct access *access, tree agg, HOST_WIDE_INT top_offset) { do { if (access->grp_to_be_replaced && !build_ref_for_offset (NULL, TREE_TYPE (agg), access->offset - top_offset, access->type, false)) return false; if (access->first_child && !ref_expr_for_all_replacements_p (access->first_child, agg, top_offset)) return false; access = access->next_sibling; } while (access); return true; } /* Generate statements copying scalar replacements of accesses within a subtree into or out of AGG. ACCESS is the first child of the root of the subtree to be processed. AGG is an aggregate type expression (can be a declaration but does not have to be, it can for example also be an indirect_ref). TOP_OFFSET is the offset of the processed subtree which has to be subtracted from offsets of individual accesses to get corresponding offsets for AGG. If CHUNK_SIZE is non-null, copy only replacements in the interval <start_offset, start_offset + chunk_size>, otherwise copy all. GSI is a statement iterator used to place the new statements. WRITE should be true when the statements should write from AGG to the replacement and false if vice versa. if INSERT_AFTER is true, new statements will be added after the current statement in GSI, they will be added before the statement otherwise. */ static void generate_subtree_copies (struct access *access, tree agg, HOST_WIDE_INT top_offset, HOST_WIDE_INT start_offset, HOST_WIDE_INT chunk_size, gimple_stmt_iterator *gsi, bool write, bool insert_after) { do { tree expr = unshare_expr (agg); if (chunk_size && access->offset >= start_offset + chunk_size) return; if (access->grp_to_be_replaced && (chunk_size == 0 || access->offset + access->size > start_offset)) { bool repl_found; gimple stmt; repl_found = build_ref_for_offset (&expr, TREE_TYPE (agg), access->offset - top_offset, access->type, false); gcc_assert (repl_found); if (write) stmt = gimple_build_assign (get_access_replacement (access), expr); else { tree repl = get_access_replacement (access); TREE_NO_WARNING (repl) = 1; stmt = gimple_build_assign (expr, repl); } if (insert_after) gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } if (access->first_child) generate_subtree_copies (access->first_child, agg, top_offset, start_offset, chunk_size, gsi, write, insert_after); access = access->next_sibling; } while (access); } /* Assign zero to all scalar replacements in an access subtree. ACCESS is the the root of the subtree to be processed. GSI is the statement iterator used for inserting statements which are added after the current statement if INSERT_AFTER is true or before it otherwise. */ static void init_subtree_with_zero (struct access *access, gimple_stmt_iterator *gsi, bool insert_after) { struct access *child; if (access->grp_to_be_replaced) { gimple stmt; stmt = gimple_build_assign (get_access_replacement (access), fold_convert (access->type, integer_zero_node)); if (insert_after) gsi_insert_after (gsi, stmt, GSI_NEW_STMT); else gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } for (child = access->first_child; child; child = child->next_sibling) init_subtree_with_zero (child, gsi, insert_after); } /* Search for an access representative for the given expression EXPR and return it or NULL if it cannot be found. */ static struct access * get_access_for_expr (tree expr) { HOST_WIDE_INT offset, size, max_size; tree base; if (TREE_CODE (expr) == NOP_EXPR || TREE_CODE (expr) == VIEW_CONVERT_EXPR) expr = TREE_OPERAND (expr, 0); if (handled_component_p (expr)) { base = get_ref_base_and_extent (expr, &offset, &size, &max_size); size = max_size; if (size == -1 || !base || !DECL_P (base)) return NULL; } else if (DECL_P (expr)) { tree tree_size; base = expr; tree_size = TYPE_SIZE (TREE_TYPE (base)); if (tree_size && host_integerp (tree_size, 1)) size = max_size = tree_low_cst (tree_size, 1); else return NULL; offset = 0; } else return NULL; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; return get_var_base_offset_size_access (base, offset, size); } /* Substitute into *EXPR an expression of type TYPE with the value of the replacement of ACCESS. This is done either by producing a special V_C_E assignment statement converting the replacement to a new temporary of the requested type if TYPE is not TREE_ADDRESSABLE or by going through the base aggregate if it is. */ static void fix_incompatible_types_for_expr (tree *expr, tree type, struct access *access, gimple_stmt_iterator *gsi, bool write) { tree repl = get_access_replacement (access); if (!TREE_ADDRESSABLE (type)) { tree tmp = create_tmp_var (type, "SRvce"); add_referenced_var (tmp); if (is_gimple_reg_type (type)) tmp = make_ssa_name (tmp, NULL); if (write) { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp); *expr = tmp; if (is_gimple_reg_type (type)) SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi); stmt = gimple_build_assign (repl, conv); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl); stmt = gimple_build_assign (tmp, conv); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); if (is_gimple_reg_type (type)) SSA_NAME_DEF_STMT (tmp) = stmt; *expr = tmp; update_stmt (stmt); } } else { if (write) { gimple stmt; stmt = gimple_build_assign (repl, unshare_expr (access->expr)); gsi_insert_after (gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { gimple stmt; stmt = gimple_build_assign (unshare_expr (access->expr), repl); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); update_stmt (stmt); } } } /* Callback for scan_function. Replace the expression EXPR with a scalar replacement if there is one and generate other statements to do type conversion or subtree copying if necessary. GSI is used to place newly created statements, WRITE is true if the expression is being written to (it is on a LHS of a statement or output in an assembly statement). */ static bool sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write, void *data ATTRIBUTE_UNUSED) { struct access *access; tree type, bfr; if (TREE_CODE (*expr) == BIT_FIELD_REF) { bfr = *expr; expr = &TREE_OPERAND (*expr, 0); } else bfr = NULL_TREE; if (TREE_CODE (*expr) == REALPART_EXPR || TREE_CODE (*expr) == IMAGPART_EXPR) expr = &TREE_OPERAND (*expr, 0); type = TREE_TYPE (*expr); access = get_access_for_expr (*expr); if (!access) return false; if (access->grp_to_be_replaced) { if (!useless_type_conversion_p (type, access->type)) fix_incompatible_types_for_expr (expr, type, access, gsi, write); else *expr = get_access_replacement (access); } if (access->first_child) { HOST_WIDE_INT start_offset, chunk_size; if (bfr && host_integerp (TREE_OPERAND (bfr, 1), 1) && host_integerp (TREE_OPERAND (bfr, 2), 1)) { start_offset = tree_low_cst (TREE_OPERAND (bfr, 1), 1); chunk_size = tree_low_cst (TREE_OPERAND (bfr, 2), 1); } else start_offset = chunk_size = 0; generate_subtree_copies (access->first_child, access->base, 0, start_offset, chunk_size, gsi, write, write); } return true; } /* Store all replacements in the access tree rooted in TOP_RACC either to their base aggregate if there are unscalarized data or directly to LHS otherwise. */ static void handle_unscalarized_data_in_subtree (struct access *top_racc, tree lhs, gimple_stmt_iterator *gsi) { if (top_racc->grp_unscalarized_data) generate_subtree_copies (top_racc->first_child, top_racc->base, 0, 0, 0, gsi, false, false); else generate_subtree_copies (top_racc->first_child, lhs, top_racc->offset, 0, 0, gsi, false, false); } /* Try to generate statements to load all sub-replacements in an access (sub)tree (LACC is the first child) from scalar replacements in the TOP_RACC (sub)tree. If that is not possible, refresh the TOP_RACC base aggregate and load the accesses from it. LEFT_OFFSET is the offset of the left whole subtree being copied, RIGHT_OFFSET is the same thing for the right subtree. GSI is stmt iterator used for statement insertions. *REFRESHED is true iff the rhs top aggregate has already been refreshed by contents of its scalar reductions and is set to true if this function has to do it. */ static void load_assign_lhs_subreplacements (struct access *lacc, struct access *top_racc, HOST_WIDE_INT left_offset, HOST_WIDE_INT right_offset, gimple_stmt_iterator *old_gsi, gimple_stmt_iterator *new_gsi, bool *refreshed, tree lhs) { do { if (lacc->grp_to_be_replaced) { struct access *racc; HOST_WIDE_INT offset = lacc->offset - left_offset + right_offset; racc = find_access_in_subtree (top_racc, offset, lacc->size); if (racc && racc->grp_to_be_replaced) { gimple stmt; if (useless_type_conversion_p (lacc->type, racc->type)) stmt = gimple_build_assign (get_access_replacement (lacc), get_access_replacement (racc)); else { tree rhs = fold_build1 (VIEW_CONVERT_EXPR, lacc->type, get_access_replacement (racc)); stmt = gimple_build_assign (get_access_replacement (lacc), rhs); } gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } else { tree expr = unshare_expr (top_racc->base); bool repl_found; gimple stmt; /* No suitable access on the right hand side, need to load from the aggregate. See if we have to update it first... */ if (!*refreshed) { gcc_assert (top_racc->first_child); handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); *refreshed = true; } repl_found = build_ref_for_offset (&expr, TREE_TYPE (top_racc->base), lacc->offset - left_offset, lacc->type, false); gcc_assert (repl_found); stmt = gimple_build_assign (get_access_replacement (lacc), expr); gsi_insert_after (new_gsi, stmt, GSI_NEW_STMT); update_stmt (stmt); } } else if (lacc->grp_read && !lacc->grp_covered && !*refreshed) { handle_unscalarized_data_in_subtree (top_racc, lhs, old_gsi); *refreshed = true; } if (lacc->first_child) load_assign_lhs_subreplacements (lacc->first_child, top_racc, left_offset, right_offset, old_gsi, new_gsi, refreshed, lhs); lacc = lacc->next_sibling; } while (lacc); } /* Return true iff ACC is non-NULL and has subaccesses. */ static inline bool access_has_children_p (struct access *acc) { return acc && acc->first_child; } /* Modify assignments with a CONSTRUCTOR on their RHS. STMT contains a pointer to the assignment and GSI is the statement iterator pointing at it. Returns the same values as sra_modify_assign. */ static enum scan_assign_result sra_modify_constructor_assign (gimple *stmt, gimple_stmt_iterator *gsi) { tree lhs = gimple_assign_lhs (*stmt); struct access *acc; gcc_assert (TREE_CODE (lhs) != REALPART_EXPR && TREE_CODE (lhs) != IMAGPART_EXPR); acc = get_access_for_expr (lhs); if (!acc) return SRA_SA_NONE; if (VEC_length (constructor_elt, CONSTRUCTOR_ELTS (gimple_assign_rhs1 (*stmt))) > 0) { /* I have never seen this code path trigger but if it can happen the following should handle it gracefully. */ if (access_has_children_p (acc)) generate_subtree_copies (acc->first_child, acc->base, 0, 0, 0, gsi, true, true); return SRA_SA_PROCESSED; } if (acc->grp_covered) { init_subtree_with_zero (acc, gsi, false); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); return SRA_SA_REMOVED; } else { init_subtree_with_zero (acc, gsi, true); return SRA_SA_PROCESSED; } } /* Modify statements with IMAGPART_EXPR or REALPART_EXPR on their lhs with to-be-scalarized expressions with them. STMT is the statement and GSI is the iterator used to place new helper statements. Returns the same values as sra_modify_assign. */ static enum scan_assign_result sra_modify_partially_complex_lhs (gimple stmt, gimple_stmt_iterator *gsi) { tree lhs, complex, ptype, rp, ip; struct access *access; gimple new_stmt, aux_stmt; lhs = gimple_assign_lhs (stmt); complex = TREE_OPERAND (lhs, 0); access = get_access_for_expr (complex); if (!access || !access->grp_to_be_replaced) return SRA_SA_NONE; ptype = TREE_TYPE (TREE_TYPE (complex)); rp = create_tmp_var (ptype, "SRr"); add_referenced_var (rp); rp = make_ssa_name (rp, NULL); ip = create_tmp_var (ptype, "SRp"); add_referenced_var (ip); ip = make_ssa_name (ip, NULL); if (TREE_CODE (lhs) == IMAGPART_EXPR) { aux_stmt = gimple_build_assign (rp, fold_build1 (REALPART_EXPR, ptype, get_access_replacement (access))); SSA_NAME_DEF_STMT (rp) = aux_stmt; gimple_assign_set_lhs (stmt, ip); SSA_NAME_DEF_STMT (ip) = stmt; } else { aux_stmt = gimple_build_assign (ip, fold_build1 (IMAGPART_EXPR, ptype, get_access_replacement (access))); SSA_NAME_DEF_STMT (ip) = aux_stmt; gimple_assign_set_lhs (stmt, rp); SSA_NAME_DEF_STMT (rp) = stmt; } gsi_insert_before (gsi, aux_stmt, GSI_SAME_STMT); update_stmt (aux_stmt); new_stmt = gimple_build_assign (get_access_replacement (access), fold_build2 (COMPLEX_EXPR, access->type, rp, ip)); gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT); update_stmt (new_stmt); return SRA_SA_PROCESSED; } /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components. */ static bool contains_view_convert_expr_p (tree t) { while (1) { if (TREE_CODE (t) == VIEW_CONVERT_EXPR) return true; if (!handled_component_p (t)) return false; t = TREE_OPERAND (t, 0); } } /* Change STMT to assign compatible types by means of adding component or array references or VIEW_CONVERT_EXPRs. All parameters have the same meaning as variable with the same names in sra_modify_assign. This is done in a such a complicated way in order to make testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some cases. */ static void fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt, struct access *lacc, struct access *racc, tree lhs, tree *rhs, tree ltype, tree rtype) { if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype) && !access_has_children_p (lacc)) { tree expr = unshare_expr (lhs); bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype, false); if (found) { gimple_assign_set_lhs (*stmt, expr); return; } } if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype) && !access_has_children_p (racc)) { tree expr = unshare_expr (*rhs); bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype, false); if (found) { gimple_assign_set_rhs1 (*stmt, expr); return; } } *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs); gimple_assign_set_rhs_from_tree (gsi, *rhs); *stmt = gsi_stmt (*gsi); } /* Callback of scan_function to process assign statements. It examines both sides of the statement, replaces them with a scalare replacement if there is one and generating copying of replacements if scalarized aggregates have been used in the assignment. STMT is a pointer to the assign statement, GSI is used to hold generated statements for type conversions and subtree copying. */ static enum scan_assign_result sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi, void *data ATTRIBUTE_UNUSED) { struct access *lacc, *racc; tree ltype, rtype; tree lhs, rhs; bool modify_this_stmt; if (gimple_assign_rhs2 (*stmt)) return SRA_SA_NONE; lhs = gimple_assign_lhs (*stmt); rhs = gimple_assign_rhs1 (*stmt); if (TREE_CODE (rhs) == CONSTRUCTOR) return sra_modify_constructor_assign (stmt, gsi); if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR) return sra_modify_partially_complex_lhs (*stmt, gsi); if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF) { modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt), gsi, false, data); modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt), gsi, true, data); return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } lacc = get_access_for_expr (lhs); racc = get_access_for_expr (rhs); if (!lacc && !racc) return SRA_SA_NONE; modify_this_stmt = ((lacc && lacc->grp_to_be_replaced) || (racc && racc->grp_to_be_replaced)); if (lacc && lacc->grp_to_be_replaced) { lhs = get_access_replacement (lacc); gimple_assign_set_lhs (*stmt, lhs); ltype = lacc->type; } else ltype = TREE_TYPE (lhs); if (racc && racc->grp_to_be_replaced) { rhs = get_access_replacement (racc); gimple_assign_set_rhs1 (*stmt, rhs); rtype = racc->type; } else rtype = TREE_TYPE (rhs); /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate the statement makes the position of this pop_stmt_changes() a bit awkward but hopefully make some sense. */ if (modify_this_stmt) { if (!useless_type_conversion_p (ltype, rtype)) fix_modified_assign_compatibility (gsi, stmt, lacc, racc, lhs, &rhs, ltype, rtype); } if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs) || (access_has_children_p (racc) && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset)) || (access_has_children_p (lacc) && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset))) { if (access_has_children_p (racc)) generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0, gsi, false, false); if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, lacc->base, 0, 0, 0, gsi, true, true); } else { if (access_has_children_p (lacc) && access_has_children_p (racc)) { gimple_stmt_iterator orig_gsi = *gsi; bool refreshed; if (lacc->grp_read && !lacc->grp_covered) { handle_unscalarized_data_in_subtree (racc, lhs, gsi); refreshed = true; } else refreshed = false; load_assign_lhs_subreplacements (lacc->first_child, racc, lacc->offset, racc->offset, &orig_gsi, gsi, &refreshed, lhs); if (!refreshed || !racc->grp_unscalarized_data) { if (*stmt == gsi_stmt (*gsi)) gsi_next (gsi); unlink_stmt_vdef (*stmt); gsi_remove (&orig_gsi, true); return SRA_SA_REMOVED; } } else { if (access_has_children_p (racc)) { if (!racc->grp_unscalarized_data) { generate_subtree_copies (racc->first_child, lhs, racc->offset, 0, 0, gsi, false, false); gcc_assert (*stmt == gsi_stmt (*gsi)); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); return SRA_SA_REMOVED; } else generate_subtree_copies (racc->first_child, lhs, racc->offset, 0, 0, gsi, false, true); } else if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, rhs, lacc->offset, 0, 0, gsi, true, true); } } return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE; } /* Generate statements initializing scalar replacements of parts of function parameters. */ static void initialize_parameter_reductions (void) { gimple_stmt_iterator gsi; gimple_seq seq = NULL; tree parm; for (parm = DECL_ARGUMENTS (current_function_decl); parm; parm = TREE_CHAIN (parm)) { VEC (access_p, heap) *access_vec; struct access *access; if (!bitmap_bit_p (candidate_bitmap, DECL_UID (parm))) continue; access_vec = get_base_access_vector (parm); if (!access_vec) continue; if (!seq) { seq = gimple_seq_alloc (); gsi = gsi_start (seq); } for (access = VEC_index (access_p, access_vec, 0); access; access = access->next_grp) generate_subtree_copies (access, parm, 0, 0, 0, &gsi, true, true); } if (seq) gsi_insert_seq_on_edge_immediate (single_succ_edge (ENTRY_BLOCK_PTR), seq); } /* The "main" function of intraprocedural SRA passes. Runs the analysis and if it reveals there are components of some aggregates to be scalarized, it runs the required transformations. */ static unsigned int perform_intra_sra (void) { int ret = 0; sra_initialize (); if (!find_var_candidates ()) goto out; if (!scan_function (build_access_from_expr, build_accesses_from_assign, NULL, true, NULL)) goto out; if (!analyze_all_variable_accesses ()) goto out; scan_function (sra_modify_expr, sra_modify_assign, NULL, false, NULL); initialize_parameter_reductions (); ret = TODO_update_ssa; if (sra_mode == SRA_MODE_EARLY_INTRA) ret = TODO_update_ssa; else ret = TODO_update_ssa | TODO_rebuild_alias; out: sra_deinitialize (); return ret; } /* Perform early intraprocedural SRA. */ static unsigned int early_intra_sra (void) { sra_mode = SRA_MODE_EARLY_INTRA; return perform_intra_sra (); } /* Perform "late" intraprocedural SRA. */ static unsigned int late_intra_sra (void) { sra_mode = SRA_MODE_INTRA; return perform_intra_sra (); } static bool gate_intra_sra (void) { return flag_tree_sra != 0; } struct gimple_opt_pass pass_sra_early = { { GIMPLE_PASS, "esra", /* name */ gate_intra_sra, /* gate */ early_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa | TODO_ggc_collect | TODO_verify_ssa /* todo_flags_finish */ } }; struct gimple_opt_pass pass_sra = { { GIMPLE_PASS, "sra", /* name */ gate_intra_sra, /* gate */ late_intra_sra, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_TREE_SRA, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ TODO_update_address_taken, /* todo_flags_start */ TODO_dump_func | TODO_update_ssa | TODO_ggc_collect | TODO_verify_ssa /* todo_flags_finish */ } }; ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2009-05-20 10:23 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-28 10:10 [PATCH 0/5] New implementation of SRA Martin Jambor 2009-04-28 10:10 ` [PATCH 4/5] Fix indirect inlining fallout with new intra-SRA Martin Jambor 2009-04-28 12:15 ` Richard Guenther 2009-04-29 12:39 ` Martin Jambor 2009-04-29 13:13 ` Richard Guenther 2009-05-20 10:23 ` Martin Jambor 2009-04-28 10:10 ` [PATCH 2/5] Make tree-complex.c:extract_component() handle V_C_Es Martin Jambor 2009-04-28 11:52 ` Richard Guenther 2009-05-20 10:20 ` Martin Jambor 2009-04-28 10:11 ` [PATCH 1/5] Get rid off old external tree-sra.c stuff Martin Jambor 2009-04-28 12:55 ` Richard Guenther 2009-05-20 10:19 ` Martin Jambor 2009-04-28 10:12 ` [PATCH 5/5] "Fix" the rest of the fallouts of new intra-SRA Martin Jambor 2009-04-28 13:05 ` Richard Guenther 2009-04-28 10:14 ` [PATCH 3/5] New intraprocedural Scalar Reduction of Aggregates Martin Jambor 2009-04-28 10:27 ` Martin Jambor 2009-04-29 12:56 ` Richard Guenther 2009-05-10 10:33 ` Martin Jambor 2009-05-10 11:48 ` Richard Guenther 2009-05-12 0:24 ` Martin Jambor 2009-05-18 13:26 ` Richard Guenther 2009-05-10 10:39 ` Martin Jambor 2009-05-12 9:49 ` Martin Jambor 2009-04-29 10:59 ` Richard Guenther 2009-04-29 12:16 ` Martin Jambor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).