public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH 6/6] Make SRA replace constant-pool loads
       [not found] <=<F9F317CB-3AA4-4583-95D6-72A3E3ABED2F@gmail.com>
@ 2015-11-12 18:36 ` Alan Lawrence
  2015-11-16 13:44   ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Lawrence @ 2015-11-12 18:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, law

On 06/11/15 16:29, Richard Biener wrote:
>>> 2) You should be able to use fold_ctor_reference directly (in place
>> of
>>> all your code
>>> in case offset and size are readily available - don't remember
>> exactly how
>>> complete scalarization "walks" elements).  Alternatively use
>>> fold_const_aggregate_ref.

Well, yes I was able to remove subst_constant_pool_initial by using
fold_ctor_reference, with some fairly strict checks about the access expressions
found while scanning the input. I still needed to either deep-scan the constant
value itself, or allow completely_scalarize to fail, due to Ada c460010 where
a variable has DECL_INITIAL: {VIEW_CONVERT_EXPR<integer[2:4]>({1, 2, 3})}
(that is, a CONSTRUCTOR whose only element is a V.C.E. of another CONSTRUCTOR).

However:

> I tried to suggest creating piecewise loads from the constant pool as opposed to the aggregate one.  But I suppose the difficulty is to determine 'pieces', not their values (that followup passes and folding can handle quite efficiently).

I've now got this working, and it simplifies the patch massively :).

We now replace e.g. a constant-pool load a = *.LC0, with a series of loads e.g.
SR_a1 = SRlc0_1
SR_a2 = SRlc0_2
etc. etc. (well those wouldn't be quite the names, but you get the idea.)

And then at the start of the function we initialise
SRlc0_1 = *.LC0[0]
SRlc0_2 = *.LC0[1]
etc. etc.
- I needed to use SRlc0_1 etc. variables because some of the constant-pool loads
coming from Ada are rather more complicated than 'a = *.LC0' and substituting
the access expression (e.g. '*.LC0[0].foo'), rather than the replacement_decl,
into those lead to just too many verify_gimple failures.

However, the initialization seems to work well enough, if I put a check into
sra_modify_expr to avoid changing 'SRlc0_1 = *.LC0[0]' into 'SRlc0_1 = SRlc0_1'
(!). Slightly hacky perhaps but harmless as there cannot be a statement writing
to a scalar replacement prior to sra_modify_expr.

Also I had to prevent writing scalar replacements back to the constant pool
(gnat opt31.adb).

Finally - I've left the disqualified_constants code in, but am now only using it
for an assert...so I guess it'd be reasonable to take that out. In an ideal
world we could do those checks only in checking builds but allocating bitmaps
only for checking builds feels like it would make checking vs non-checking
diverge too much.

This is now passing bootstrap+check-{gcc,g++,fortran} on AArch64, and the same
plus Ada on x64_64 and ARM, except for some additional guality failures in
pr54970-1.c on AArch64 (tests which are already failing on ARM) - I haven't had
any success in figuring those out yet, suggestions welcome.

However, without the DOM patches, this does not fix the oiginal ssa-dom-cse-2.c
testcase, even though the load is scalarized in esra and the individual element
loads resolved in ccp2. I don't think I'll be able to respin those between now
and stage 1 ending on Saturday, so hence I've had to drop the testsuite change,
and can only / will merely describe the remaining problem in PR/63679. I'm
now benchmarking on AArch64 to see what difference this patch makes on its own.

Thoughts?

gcc/ChangeLog:

	PR target/63679
	* tree-sra.c (disqualified_constants): New.
	(sra_initialize): Allocate disqualified_constants.
	(sra_deinitialize): Free disqualified_constants.
	(disqualify_candidate): Update disqualified_constants when appropriate.
	(create_access): Scan for constant-pool entries as we go along.
	(scalarizable_type_p): Add check against type_contains_placeholder_p.
	(maybe_add_sra_candidate): Allow constant-pool entries.
	(initialize_constant_pool_replacements): New.
	(sra_modify_expr): Avoid mangling assignments created by previous.
	(sra_modify_function_body): Call initialize_constant_pool_replacements.
---
 gcc/tree-sra.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 90 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 1d4a632..aa60f06 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -325,6 +325,10 @@ candidate (unsigned uid)
    those which cannot be (because they are and need be used as a whole).  */
 static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
 
+/* Bitmap of candidates in the constant pool, which cannot be scalarized
+   because this would produce non-constant expressions (e.g. Ada).  */
+static bitmap disqualified_constants;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -647,6 +651,7 @@ sra_initialize (void)
     (vec_safe_length (cfun->local_decls) / 2);
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
+  disqualified_constants = BITMAP_ALLOC (NULL);
   gcc_obstack_init (&name_obstack);
   base_access_vec = new hash_map<tree, auto_vec<access_p> >;
   memset (&sra_stats, 0, sizeof (sra_stats));
@@ -665,6 +670,7 @@ sra_deinitialize (void)
   candidates = NULL;
   BITMAP_FREE (should_scalarize_away_bitmap);
   BITMAP_FREE (cannot_scalarize_away_bitmap);
+  BITMAP_FREE (disqualified_constants);
   access_pool.release ();
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);
@@ -679,6 +685,8 @@ disqualify_candidate (tree decl, const char *reason)
 {
   if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
     candidates->remove_elt_with_hash (decl, DECL_UID (decl));
+  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
+    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -830,8 +838,11 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
   return access;
 }
 
+static bool maybe_add_sra_candidate (tree);
+
 /* Create and insert access for EXPR. Return created access, or NULL if it is
-   not possible.  */
+   not possible.  Also scan for uses of constant pool as we go along and add
+   to candidates.  */
 
 static struct access *
 create_access (tree expr, gimple *stmt, bool write)
@@ -854,6 +865,25 @@ create_access (tree expr, gimple *stmt, bool write)
   else
     ptr = false;
 
+  /* For constant-pool entries, check we can substitute the constant value.  */
+  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
+      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
+    {
+      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
+      if (expr != base
+	  && !is_gimple_reg_type (TREE_TYPE (expr))
+	  && dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  /* This occurs in Ada with accesses to ARRAY_RANGE_REFs,
+	     and elements of multidimensional arrays (which are
+	     multi-element arrays in their own right).  */
+	  fprintf (dump_file, "Allowing non-reg-type load of part"
+			      " of constant-pool entry: ");
+	  print_generic_expr (dump_file, expr, 0);
+	}
+      maybe_add_sra_candidate (base);
+    }
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
     return NULL;
 
@@ -912,6 +942,8 @@ static bool
 scalarizable_type_p (tree type)
 {
   gcc_assert (!is_gimple_reg_type (type));
+  if (type_contains_placeholder_p (type))
+    return false;
 
   switch (TREE_CODE (type))
   {
@@ -1832,7 +1864,12 @@ maybe_add_sra_candidate (tree var)
       reject (var, "not aggregate");
       return false;
     }
-  if (needs_to_live_in_memory (var))
+  /* Allow constant-pool entries (that "need to live in memory")
+     unless we are doing IPA SRA.  */
+  if (needs_to_live_in_memory (var)
+      && (sra_mode == SRA_MODE_EARLY_IPA
+	  || TREE_CODE (var) != VAR_DECL
+	  || !DECL_IN_CONSTANT_POOL (var)))
     {
       reject (var, "needs to live in memory");
       return false;
@@ -3250,6 +3287,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
   racc = get_access_for_expr (rhs);
   if (!lacc && !racc)
     return SRA_AM_NONE;
+  /* Avoid modifying initializations of constant-pool replacements.  */
+  if (racc && (racc->replacement_decl == lhs))
+    return SRA_AM_NONE;
 
   loc = gimple_location (stmt);
   if (lacc && lacc->grp_to_be_replaced)
@@ -3366,7 +3406,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
       || contains_vce_or_bfcref_p (lhs)
       || stmt_ends_bb_p (stmt))
     {
-      if (access_has_children_p (racc))
+      /* No need to copy into a constant-pool, it comes pre-initialized.  */
+      if (access_has_children_p (racc)
+	  && (TREE_CODE (racc->base) != VAR_DECL
+	      || !DECL_IN_CONSTANT_POOL (racc->base)))
 	generate_subtree_copies (racc->first_child, rhs, racc->offset, 0, 0,
 				 gsi, false, false, loc);
       if (access_has_children_p (lacc))
@@ -3469,6 +3512,48 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
     }
 }
 
+static void
+initialize_constant_pool_replacements (void)
+{
+  gimple_seq seq = NULL;
+  gimple_stmt_iterator gsi = gsi_start (seq);
+  bitmap_iterator bi;
+  unsigned i;
+
+  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
+    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
+	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
+      {
+	tree var = candidate (i);
+	if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
+	  continue;
+	vec<access_p> *access_vec = get_base_access_vector (var);
+	if (!access_vec)
+	  continue;
+	for (unsigned i = 0; i < access_vec->length (); i++)
+	  {
+	    struct access *access = (*access_vec)[i];
+	    if (!access->replacement_decl)
+	      continue;
+	    gassign *stmt = gimple_build_assign (
+	      get_access_replacement (access), unshare_expr (access->expr));
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Generating constant initializer: ");
+		print_gimple_stmt (dump_file, stmt, 0, 1);
+		fprintf (dump_file, "\n");
+	      }
+	    gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	    update_stmt (stmt);
+	  }
+      }
+
+  seq = gsi_seq (gsi);
+  if (seq)
+    gsi_insert_seq_on_edge_immediate (
+      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
+}
+
 /* Traverse the function body and all modifications as decided in
    analyze_all_variable_accesses.  Return true iff the CFG has been
    changed.  */
@@ -3479,6 +3564,8 @@ sra_modify_function_body (void)
   bool cfg_changed = false;
   basic_block bb;
 
+  initialize_constant_pool_replacements ();
+
   FOR_EACH_BB_FN (bb, cfun)
     {
       gimple_stmt_iterator gsi = gsi_start_bb (bb);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-11-12 18:36 ` [PATCH 6/6] Make SRA replace constant-pool loads Alan Lawrence
@ 2015-11-16 13:44   ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2015-11-16 13:44 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches, Jeff Law

On Thu, Nov 12, 2015 at 7:35 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 06/11/15 16:29, Richard Biener wrote:
>>>> 2) You should be able to use fold_ctor_reference directly (in place
>>> of
>>>> all your code
>>>> in case offset and size are readily available - don't remember
>>> exactly how
>>>> complete scalarization "walks" elements).  Alternatively use
>>>> fold_const_aggregate_ref.
>
> Well, yes I was able to remove subst_constant_pool_initial by using
> fold_ctor_reference, with some fairly strict checks about the access expressions
> found while scanning the input. I still needed to either deep-scan the constant
> value itself, or allow completely_scalarize to fail, due to Ada c460010 where
> a variable has DECL_INITIAL: {VIEW_CONVERT_EXPR<integer[2:4]>({1, 2, 3})}
> (that is, a CONSTRUCTOR whose only element is a V.C.E. of another CONSTRUCTOR).
>
> However:
>
>> I tried to suggest creating piecewise loads from the constant pool as opposed to the aggregate one.  But I suppose the difficulty is to determine 'pieces', not their values (that followup passes and folding can handle quite efficiently).
>
> I've now got this working, and it simplifies the patch massively :).
>
> We now replace e.g. a constant-pool load a = *.LC0, with a series of loads e.g.
> SR_a1 = SRlc0_1
> SR_a2 = SRlc0_2
> etc. etc. (well those wouldn't be quite the names, but you get the idea.)
>
> And then at the start of the function we initialise
> SRlc0_1 = *.LC0[0]
> SRlc0_2 = *.LC0[1]
> etc. etc.
> - I needed to use SRlc0_1 etc. variables because some of the constant-pool loads
> coming from Ada are rather more complicated than 'a = *.LC0' and substituting
> the access expression (e.g. '*.LC0[0].foo'), rather than the replacement_decl,
> into those lead to just too many verify_gimple failures.
>
> However, the initialization seems to work well enough, if I put a check into
> sra_modify_expr to avoid changing 'SRlc0_1 = *.LC0[0]' into 'SRlc0_1 = SRlc0_1'
> (!). Slightly hacky perhaps but harmless as there cannot be a statement writing
> to a scalar replacement prior to sra_modify_expr.
>
> Also I had to prevent writing scalar replacements back to the constant pool
> (gnat opt31.adb).
>
> Finally - I've left the disqualified_constants code in, but am now only using it
> for an assert...so I guess it'd be reasonable to take that out. In an ideal
> world we could do those checks only in checking builds but allocating bitmaps
> only for checking builds feels like it would make checking vs non-checking
> diverge too much.
>
> This is now passing bootstrap+check-{gcc,g++,fortran} on AArch64, and the same
> plus Ada on x64_64 and ARM, except for some additional guality failures in
> pr54970-1.c on AArch64 (tests which are already failing on ARM) - I haven't had
> any success in figuring those out yet, suggestions welcome.
>
> However, without the DOM patches, this does not fix the oiginal ssa-dom-cse-2.c
> testcase, even though the load is scalarized in esra and the individual element
> loads resolved in ccp2. I don't think I'll be able to respin those between now
> and stage 1 ending on Saturday, so hence I've had to drop the testsuite change,
> and can only / will merely describe the remaining problem in PR/63679. I'm
> now benchmarking on AArch64 to see what difference this patch makes on its own.
>
> Thoughts?

The new function needs a comment.  Otherwise, given there are no testcases,
I wonder what this does to nested constructs like multi-dimensional arrays
in a struct.

The patch looks _much_ better now though.  But I also wonder what the effects
on code-generation are for the non-reg-type load of part case.

I'd say the patch is ok with the comment added and a testcase (or two) verifying
it works as desired.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         PR target/63679
>         * tree-sra.c (disqualified_constants): New.
>         (sra_initialize): Allocate disqualified_constants.
>         (sra_deinitialize): Free disqualified_constants.
>         (disqualify_candidate): Update disqualified_constants when appropriate.
>         (create_access): Scan for constant-pool entries as we go along.
>         (scalarizable_type_p): Add check against type_contains_placeholder_p.
>         (maybe_add_sra_candidate): Allow constant-pool entries.
>         (initialize_constant_pool_replacements): New.
>         (sra_modify_expr): Avoid mangling assignments created by previous.
>         (sra_modify_function_body): Call initialize_constant_pool_replacements.
> ---
>  gcc/tree-sra.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 90 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 1d4a632..aa60f06 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -325,6 +325,10 @@ candidate (unsigned uid)
>     those which cannot be (because they are and need be used as a whole).  */
>  static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
>
> +/* Bitmap of candidates in the constant pool, which cannot be scalarized
> +   because this would produce non-constant expressions (e.g. Ada).  */
> +static bitmap disqualified_constants;
> +
>  /* Obstack for creation of fancy names.  */
>  static struct obstack name_obstack;
>
> @@ -647,6 +651,7 @@ sra_initialize (void)
>      (vec_safe_length (cfun->local_decls) / 2);
>    should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>    cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
> +  disqualified_constants = BITMAP_ALLOC (NULL);
>    gcc_obstack_init (&name_obstack);
>    base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>    memset (&sra_stats, 0, sizeof (sra_stats));
> @@ -665,6 +670,7 @@ sra_deinitialize (void)
>    candidates = NULL;
>    BITMAP_FREE (should_scalarize_away_bitmap);
>    BITMAP_FREE (cannot_scalarize_away_bitmap);
> +  BITMAP_FREE (disqualified_constants);
>    access_pool.release ();
>    assign_link_pool.release ();
>    obstack_free (&name_obstack, NULL);
> @@ -679,6 +685,8 @@ disqualify_candidate (tree decl, const char *reason)
>  {
>    if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
>      candidates->remove_elt_with_hash (decl, DECL_UID (decl));
> +  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
> +    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -830,8 +838,11 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
>    return access;
>  }
>
> +static bool maybe_add_sra_candidate (tree);
> +
>  /* Create and insert access for EXPR. Return created access, or NULL if it is
> -   not possible.  */
> +   not possible.  Also scan for uses of constant pool as we go along and add
> +   to candidates.  */
>
>  static struct access *
>  create_access (tree expr, gimple *stmt, bool write)
> @@ -854,6 +865,25 @@ create_access (tree expr, gimple *stmt, bool write)
>    else
>      ptr = false;
>
> +  /* For constant-pool entries, check we can substitute the constant value.  */
> +  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
> +      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
> +    {
> +      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
> +      if (expr != base
> +         && !is_gimple_reg_type (TREE_TYPE (expr))
> +         && dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         /* This occurs in Ada with accesses to ARRAY_RANGE_REFs,
> +            and elements of multidimensional arrays (which are
> +            multi-element arrays in their own right).  */
> +         fprintf (dump_file, "Allowing non-reg-type load of part"
> +                             " of constant-pool entry: ");
> +         print_generic_expr (dump_file, expr, 0);
> +       }
> +      maybe_add_sra_candidate (base);
> +    }
> +
>    if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>      return NULL;
>
> @@ -912,6 +942,8 @@ static bool
>  scalarizable_type_p (tree type)
>  {
>    gcc_assert (!is_gimple_reg_type (type));
> +  if (type_contains_placeholder_p (type))
> +    return false;
>
>    switch (TREE_CODE (type))
>    {
> @@ -1832,7 +1864,12 @@ maybe_add_sra_candidate (tree var)
>        reject (var, "not aggregate");
>        return false;
>      }
> -  if (needs_to_live_in_memory (var))
> +  /* Allow constant-pool entries (that "need to live in memory")
> +     unless we are doing IPA SRA.  */
> +  if (needs_to_live_in_memory (var)
> +      && (sra_mode == SRA_MODE_EARLY_IPA
> +         || TREE_CODE (var) != VAR_DECL
> +         || !DECL_IN_CONSTANT_POOL (var)))
>      {
>        reject (var, "needs to live in memory");
>        return false;
> @@ -3250,6 +3287,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>    racc = get_access_for_expr (rhs);
>    if (!lacc && !racc)
>      return SRA_AM_NONE;
> +  /* Avoid modifying initializations of constant-pool replacements.  */
> +  if (racc && (racc->replacement_decl == lhs))
> +    return SRA_AM_NONE;
>
>    loc = gimple_location (stmt);
>    if (lacc && lacc->grp_to_be_replaced)
> @@ -3366,7 +3406,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>        || contains_vce_or_bfcref_p (lhs)
>        || stmt_ends_bb_p (stmt))
>      {
> -      if (access_has_children_p (racc))
> +      /* No need to copy into a constant-pool, it comes pre-initialized.  */
> +      if (access_has_children_p (racc)
> +         && (TREE_CODE (racc->base) != VAR_DECL
> +             || !DECL_IN_CONSTANT_POOL (racc->base)))
>         generate_subtree_copies (racc->first_child, rhs, racc->offset, 0, 0,
>                                  gsi, false, false, loc);
>        if (access_has_children_p (lacc))
> @@ -3469,6 +3512,48 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>      }
>  }
>
> +static void
> +initialize_constant_pool_replacements (void)
> +{
> +  gimple_seq seq = NULL;
> +  gimple_stmt_iterator gsi = gsi_start (seq);
> +  bitmap_iterator bi;
> +  unsigned i;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
> +    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
> +       && !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
> +      {
> +       tree var = candidate (i);
> +       if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
> +         continue;
> +       vec<access_p> *access_vec = get_base_access_vector (var);
> +       if (!access_vec)
> +         continue;
> +       for (unsigned i = 0; i < access_vec->length (); i++)
> +         {
> +           struct access *access = (*access_vec)[i];
> +           if (!access->replacement_decl)
> +             continue;
> +           gassign *stmt = gimple_build_assign (
> +             get_access_replacement (access), unshare_expr (access->expr));
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             {
> +               fprintf (dump_file, "Generating constant initializer: ");
> +               print_gimple_stmt (dump_file, stmt, 0, 1);
> +               fprintf (dump_file, "\n");
> +             }
> +           gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> +           update_stmt (stmt);
> +         }
> +      }
> +
> +  seq = gsi_seq (gsi);
> +  if (seq)
> +    gsi_insert_seq_on_edge_immediate (
> +      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
> +}
> +
>  /* Traverse the function body and all modifications as decided in
>     analyze_all_variable_accesses.  Return true iff the CFG has been
>     changed.  */
> @@ -3479,6 +3564,8 @@ sra_modify_function_body (void)
>    bool cfg_changed = false;
>    basic_block bb;
>
> +  initialize_constant_pool_replacements ();
> +
>    FOR_EACH_BB_FN (bb, cfun)
>      {
>        gimple_stmt_iterator gsi = gsi_start_bb (bb);
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-11-05 18:55     ` Alan Lawrence
  2015-11-06 10:25       ` Eric Botcazou
@ 2015-11-06 16:29       ` Richard Biener
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Biener @ 2015-11-06 16:29 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On November 5, 2015 7:55:12 PM GMT+01:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>On 3 November 2015 at 14:01, Richard Biener
><richard.guenther@gmail.com> wrote:
>>
>> Hum.  I still wonder why we need all this complication ...
>
>Well, certainly I'd love to make it simpler, and if the complication
>is because I've gone about trying to deal with especially Ada in the
>wrong way...
>
>>  I would
>> expect that if
>> we simply decided to completely scalarize a constant pool aggregate
>load then
>> we can always do that.  SRA should then simply emit the _same_ IL as
>it does
>> for other complete scalarization element accesses.  The only
>difference is
>> that the result should be simplifiable to omit the element load from
>> the constant pool.
>>
>> 1) The FRE pass following SRA should do that for you.
>...
>> That is, I'd like to see the patch greatly simplified to just
>consider
>> constant pool
>> complete scalarization as if it were a regular variable.
>
>Hmm, can you clarify, do you mean I should *not* replace constant pool
>values with their DECL_INITIAL? The attempt to substitute in the
>initial value is what leads to most of the problems. For example, in
>gnat/opt31.adb, create_access finds this expression accessing *.LC0:
>
>MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
>struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
><PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
>4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}
>
>this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
>far I haven't extended subst_constant_pool_initial to handle
>ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:
>
>MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
>struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
><PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
>4294967295] *)&*.LC0]
>
>because the type here has size:
>
>MIN_EXPR <_GLOBAL.SZ2.ada_opt31 (<PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->UB0, <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0), 17179869176>
>
>inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
>4-element array (fine). Sadly while the MEM_REF
>type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
>does not....
>
>One possibility is that this whole construct, ARRAY_RANGE_REF that it
>is, should mark *.LC0 in cannot_scalarize_away_bitmap.
>
>However, disqualified_constants is also necessary if I want a
>meaningful assertion that we do not re-add constant pool entries as
>candidates after we've discovered them - or should I try to rely on
>there only being one expression that accesses each constant pool
>entry? (I don't think that's guaranteed as tree_output_constant_def
>does hashing, I admit I haven't really tried to break that assumption)

I see.

>> 2) You should be able to use fold_ctor_reference directly (in place
>of
>> all your code
>> in case offset and size are readily available - don't remember
>exactly how
>> complete scalarization "walks" elements).  Alternatively use
>> fold_const_aggregate_ref.
>
>That should work for completely_scalarize, yes, i.e. if I can remove
>the other route in create_access.
>
>> 3) You can simplify the stmt SRA generated by simply calling
>fold_stmt on it,
>> that will do a bit more (wasted) work compared to 2) but may be
>easier.
>>
>> I wouldn't bother with the case where we for some reason do not
>simplify
>> the constant pool load.
>
>Hmmm. As above, I'm not quite sure what you mean by "the constant pool
>load" - if that means, substituting the DECL_INITIAL in place of the
>VAR_DECL that is DECL_IN_CONSTANT_POOL, then I didn't find a general
>tree substitution routine, hence writing subst_constant_pool_initial.
>It might be possible to make that simpler/more generic (e.g. just call
>copy_node, recurse on each operand, return the original if nothing
>changed) and then fold at the end.

I tried to suggest creating piecewise loads from the constant pool as opposed to the aggregate one.  But I suppose the difficulty is to determine 'pieces', not their values (that followup passes and folding can handle quite efficiently).

As Eric I suggest you simply disqualify the candidate based on its type.

Richard.

>Thanks,
>Alan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-11-05 18:55     ` Alan Lawrence
@ 2015-11-06 10:25       ` Eric Botcazou
  2015-11-06 16:29       ` Richard Biener
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Botcazou @ 2015-11-06 10:25 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, Richard Biener

> Hmm, can you clarify, do you mean I should *not* replace constant pool
> values with their DECL_INITIAL? The attempt to substitute in the
> initial value is what leads to most of the problems. For example, in
> gnat/opt31.adb, create_access finds this expression accessing *.LC0:
> 
> MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
> struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
> <PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
> 4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}
> 
> this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
> far I haven't extended subst_constant_pool_initial to handle
> ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:
> 
> MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
> struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
> <PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
> 4294967295] *)&*.LC0]
> 
> because the type here has size:
> 
> MIN_EXPR <_GLOBAL.SZ2.ada_opt31 (<PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->UB0, <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0), 17179869176>
> 
> inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
> 4-element array (fine). Sadly while the MEM_REF
> type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
> does not....

FWIW you are allowed to punt on this kind of complex expressions that appear 
only in Ada.  New optimizations are sort of allowed to work on the C family of 
languages first, and be extended or not to the rest of languages afterwards.

> One possibility is that this whole construct, ARRAY_RANGE_REF that it
> is, should mark *.LC0 in cannot_scalarize_away_bitmap.

ARRAY_RANGE_REF is only used in Ada so you can do that for now (unless this 
introduces regressions in the gnat.dg testsuite but I doubt it).

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-11-03 14:01   ` Richard Biener
@ 2015-11-05 18:55     ` Alan Lawrence
  2015-11-06 10:25       ` Eric Botcazou
  2015-11-06 16:29       ` Richard Biener
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Lawrence @ 2015-11-05 18:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 3 November 2015 at 14:01, Richard Biener <richard.guenther@gmail.com> wrote:
>
> Hum.  I still wonder why we need all this complication ...

Well, certainly I'd love to make it simpler, and if the complication
is because I've gone about trying to deal with especially Ada in the
wrong way...

>  I would
> expect that if
> we simply decided to completely scalarize a constant pool aggregate load then
> we can always do that.  SRA should then simply emit the _same_ IL as it does
> for other complete scalarization element accesses.  The only difference is
> that the result should be simplifiable to omit the element load from
> the constant pool.
>
> 1) The FRE pass following SRA should do that for you.
...
> That is, I'd like to see the patch greatly simplified to just consider
> constant pool
> complete scalarization as if it were a regular variable.

Hmm, can you clarify, do you mean I should *not* replace constant pool
values with their DECL_INITIAL? The attempt to substitute in the
initial value is what leads to most of the problems. For example, in
gnat/opt31.adb, create_access finds this expression accessing *.LC0:

MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
<PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}

this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
far I haven't extended subst_constant_pool_initial to handle
ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:

MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
<PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
4294967295] *)&*.LC0]

because the type here has size:

MIN_EXPR <_GLOBAL.SZ2.ada_opt31 (<PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->UB0, <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0), 17179869176>

inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
4-element array (fine). Sadly while the MEM_REF
type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
does not....

One possibility is that this whole construct, ARRAY_RANGE_REF that it
is, should mark *.LC0 in cannot_scalarize_away_bitmap.

However, disqualified_constants is also necessary if I want a
meaningful assertion that we do not re-add constant pool entries as
candidates after we've discovered them - or should I try to rely on
there only being one expression that accesses each constant pool
entry? (I don't think that's guaranteed as tree_output_constant_def
does hashing, I admit I haven't really tried to break that assumption)

> 2) You should be able to use fold_ctor_reference directly (in place of
> all your code
> in case offset and size are readily available - don't remember exactly how
> complete scalarization "walks" elements).  Alternatively use
> fold_const_aggregate_ref.

That should work for completely_scalarize, yes, i.e. if I can remove
the other route in create_access.

> 3) You can simplify the stmt SRA generated by simply calling fold_stmt on it,
> that will do a bit more (wasted) work compared to 2) but may be easier.
>
> I wouldn't bother with the case where we for some reason do not simplify
> the constant pool load.

Hmmm. As above, I'm not quite sure what you mean by "the constant pool
load" - if that means, substituting the DECL_INITIAL in place of the
VAR_DECL that is DECL_IN_CONSTANT_POOL, then I didn't find a general
tree substitution routine, hence writing subst_constant_pool_initial.
It might be possible to make that simpler/more generic (e.g. just call
copy_node, recurse on each operand, return the original if nothing
changed) and then fold at the end.

Thanks,
Alan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-10-29 19:32 ` [PATCH 6/6] Make SRA replace " Alan Lawrence
  2015-10-31 23:48   ` Bernhard Reutner-Fischer
@ 2015-11-03 14:01   ` Richard Biener
  2015-11-05 18:55     ` Alan Lawrence
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Biener @ 2015-11-03 14:01 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Thu, Oct 29, 2015 at 8:18 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This has changed quite a bit since the previous revision
> (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
> and specifically Ada on ARM.
>
> I didn't find a good alternative to scanning for constant-pool accesses "as we
> go" through the function, and although I didn't find any of these accesses
> being disqualified after we'd found them in C, some awkward constant-pool
> expressions in Ada forced me to disqualify them for a new reason (inability to
> constant-propagate). Hence, I introduced a new bitmap of
> disqualified_constants, rather than an additional pass through the function.
>
> Next, the approach of using a replacement_expr (with tho constant value) in
> place of the old replacement_decl, also failed (the decl could be used in an
> expression where substituting in the expr produced ill-formed gimple). Hence,
> constant-pool entries now use replacement_decls just like everything else, for
> which I generate initializers in initialize_constant_pool_replacements.
>
> However, I was able to avoid generating the replacement constants early, and to
> do so late in initialize_constant_pool_replacements; the main trick here was to
> use fold_array_ctor_reference, kindly pointed out by Richie :), to fold the
> MEM_REFs built in analyze_access_subtree.
>
> Finally, I found completely_scalarize was still failing to fold all the
> constant expressions, because Ada was putting VIEW_CONVERT_EXPRs in the
> constant pool, and fold-const.c did not deal with ARRAY_REFs of these. However,
> requiring completely_scalarize to be able to fold everything, seemed fragile,
> instead I decoupled from fold-const by allowing to fail.
>
> With these changes, ssa-dom-cse-2.c is fixed on all platforms with appropriate
> --param.

Hum.  I still wonder why we need all this complication ...  I would
expect that if
we simply decided to completely scalarize a constant pool aggregate load then
we can always do that.  SRA should then simply emit the _same_ IL as it does
for other complete scalarization element accesses.  The only difference is
that the result should be simplifiable to omit the element load from
the constant pool.

1) The FRE pass following SRA should do that for you.

2) You should be able to use fold_ctor_reference directly (in place of
all your code
in case offset and size are readily available - don't remember exactly how
complete scalarization "walks" elements).  Alternatively use
fold_const_aggregate_ref.

3) You can simplify the stmt SRA generated by simply calling fold_stmt on it,
that will do a bit more (wasted) work compared to 2) but may be easier.

I wouldn't bother with the case where we for some reason do not simplify
the constant pool load.

That is, I'd like to see the patch greatly simplified to just consider
constant pool
complete scalarization as if it were a regular variable.

Richard.

>
> There are still a few remaining test failures, on AArch64:
> gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
> (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os).
> ...which I'm working on. I also tested with the hack at https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html . This revealed one new failure in advsimd-intrinsics/vldX.c on AArch64 (fixed by https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02777.html), and fixed a bunch of guality failures on ARM:
> gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
> (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os)
>
> --Alan
>
> gcc/ChangeLog:
>
>         * tree-sra.c (disqualified_constants): New.
>         (sra_initialize): Initialize it.
>         (sra_deinitialize): Deallocate it.
>         (disqualify_candidate): Set bit in disqualified_constants.
>         (subst_constant_pool_initial): New.
>         (create_access): Scan for constant-pool entries as we go.
>         (scalarizable_type_p): Disallow types containing placeholders.
>         (completely_scalarize): Return bool to allow failure.
>         (scalarize_elem): Likewise; check we can generate constant replacements.
>         (maybe_add_sra_candidate): Allow constant-pool entries.
>         (analyze_access_subtree): Checking-assert that we can fold any refs
>         built for constant-pool entries.
>         (analyze_all_variable_accesses): Deal with completely_scalarize failing.
>         (initialize_constant_pool_replacements): New.
>         (sra_modify_function_body): Call previous.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add sra-max-scalarization-size param,
>         remove xfail.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |   7 +-
>  gcc/tree-sra.c                                | 221 ++++++++++++++++++++++++--
>  2 files changed, 208 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> index 9eccdc9..b13d583 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
> +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
>
>  int
>  foo ()
> @@ -17,7 +17,4 @@ foo ()
>  /* After late unrolling the above loop completely DOM should be
>     able to optimize this to return 28.  */
>
> -/* See PR63679 and PR64159, if the target forces the initializer to memory then
> -   DOM is not able to perform this optimization.  */
> -
> -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
> +/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 358db79..1920265 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -90,6 +90,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-iterator.h"
>  #include "gimplify-me.h"
>  #include "gimple-walk.h"
> +#include "gimple-fold.h"
>  #include "tree-cfg.h"
>  #include "flags.h"
>  #include "insn-config.h"
> @@ -336,6 +337,10 @@ candidate (unsigned uid)
>     those which cannot be (because they are and need be used as a whole).  */
>  static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
>
> +/* Bitmap of candidates in the constant pool, which cannot be scalarized
> +   because this would produce non-constant expressions (e.g. Ada).  */
> +static bitmap disqualified_constants;
> +
>  /* Obstack for creation of fancy names.  */
>  static struct obstack name_obstack;
>
> @@ -658,6 +663,7 @@ sra_initialize (void)
>      (vec_safe_length (cfun->local_decls) / 2);
>    should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>    cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
> +  disqualified_constants = BITMAP_ALLOC (NULL);
>    gcc_obstack_init (&name_obstack);
>    base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>    memset (&sra_stats, 0, sizeof (sra_stats));
> @@ -676,6 +682,7 @@ sra_deinitialize (void)
>    candidates = NULL;
>    BITMAP_FREE (should_scalarize_away_bitmap);
>    BITMAP_FREE (cannot_scalarize_away_bitmap);
> +  BITMAP_FREE (disqualified_constants);
>    access_pool.release ();
>    assign_link_pool.release ();
>    obstack_free (&name_obstack, NULL);
> @@ -690,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
>  {
>    if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
>      candidates->remove_elt_with_hash (decl, DECL_UID (decl));
> +  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
> +    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -841,8 +850,76 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
>    return access;
>  }
>
> +/* Given EXPR a chain of COMPONENT_REFs and/or ARRAY_REFs built around a
> +   constant-pool declaration VAR, builds a tree like EXPR but with VAR replaced
> +   by its DECL_INITIAL, and tries to fold it to a constant.  If folding succeeds
> +   then return the folded tree, else NULL_TREE.  */
> +
> +static tree
> +subst_constant_pool_initial (tree expr, tree var)
> +{
> +  if (TREE_CODE (expr) == VAR_DECL)
> +    {
> +      gcc_assert (DECL_IN_CONSTANT_POOL (expr));
> +      gcc_assert (expr == var);
> +      return DECL_INITIAL (expr);
> +    }
> +  if (TREE_CODE (expr) == COMPONENT_REF)
> +    {
> +      tree field = TREE_OPERAND (expr, 1);
> +      gcc_assert (TREE_CODE (field) == FIELD_DECL);
> +      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
> +      tree record = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
> +      if (!record)
> +       return NULL_TREE;
> +      tree ref = build3 (COMPONENT_REF, TREE_TYPE (expr),
> +                        record, field, NULL_TREE);
> +      tree res = fold (ref);
> +      if (res != ref)
> +       return res;
> +    }
> +  else if (TREE_CODE (expr) == ARRAY_REF)
> +    {
> +      tree array = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
> +      if (!array)
> +       return NULL_TREE;
> +      tree ref = build4 (ARRAY_REF, TREE_TYPE (expr),
> +                        array,
> +                        TREE_OPERAND (expr, 1),
> +                        TREE_OPERAND (expr, 2),
> +                        TREE_OPERAND (expr, 3));
> +      /* Ada (at least) builds ARRAY_REFs around strings.  */
> +      if (tree folded = fold_read_from_constant_string (ref))
> +       return folded;
> +      tree res = fold (ref);
> +      if (res != ref)
> +       return res;
> +    }
> +  else if (TREE_CODE (expr) == MEM_REF)
> +    {
> +      gcc_assert (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR);
> +      tree type = TREE_TYPE (expr);
> +      tree sz = TYPE_SIZE (TREE_TYPE (expr));
> +      tree val = TREE_OPERAND (TREE_OPERAND (expr, 0), 0);
> +      val = subst_constant_pool_initial (val, var);
> +      if (TREE_CODE (val) != CONSTRUCTOR
> +         || !tree_fits_uhwi_p (TREE_OPERAND (expr, 1))
> +         || !tree_fits_uhwi_p (sz))
> +       return NULL_TREE;
> +      unsigned HOST_WIDE_INT offset =
> +         tree_to_uhwi (TREE_OPERAND (expr, 1)) * BITS_PER_UNIT;
> +      return fold_ctor_reference (type, val, offset, tree_to_uhwi (sz), var);
> +    }
> +
> +  /* Unknown expression, or could not constant-fold.  */
> +  return NULL_TREE;
> +}
> +
> +static bool maybe_add_sra_candidate (tree var);
> +
>  /* Create and insert access for EXPR. Return created access, or NULL if it is
> -   not possible.  */
> +   not possible.  Also scan for uses of constant pool as we go along and add
> +   to candidates.  */
>
>  static struct access *
>  create_access (tree expr, gimple *stmt, bool write)
> @@ -865,6 +942,32 @@ create_access (tree expr, gimple *stmt, bool write)
>    else
>      ptr = false;
>
> +  /* For constant-pool entries, check we can substitute the constant value.  */
> +  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
> +      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
> +    {
> +      gcc_assert (!write);
> +      if (bitmap_bit_p (disqualified_constants, DECL_UID (base)))
> +       return NULL;
> +      tree repl = subst_constant_pool_initial (expr, base);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +      {
> +       fprintf (dump_file, "Trying to scalarize constant-pool ");
> +       print_generic_expr (dump_file, base, 0);
> +       fprintf (dump_file, "\n In expression: ");
> +       print_generic_expr (dump_file, expr, 0);
> +       fprintf (dump_file, "\n Constant replacement: ");
> +       print_generic_expr (dump_file, repl, 0);
> +       fprintf (dump_file, "\n");
> +      }
> +      /* If we can't resolve the expression to a constant, we risk creating
> +         a malformed expression (e.g. Ada testsuite, ACATS chapter C3).  */
> +      if (!repl || !TREE_CONSTANT (repl))
> +       disqualify_candidate (base, "Cannot constant-propagate out of pool.");
> +      else
> +       maybe_add_sra_candidate (base);
> +    }
> +
>    if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>      return NULL;
>
> @@ -923,6 +1026,8 @@ static bool
>  scalarizable_type_p (tree type)
>  {
>    gcc_assert (!is_gimple_reg_type (type));
> +  if (type_contains_placeholder_p (type))
> +    return false;
>
>    switch (TREE_CODE (type))
>    {
> @@ -970,15 +1075,19 @@ scalarizable_type_p (tree type)
>    }
>  }
>
> -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
> +static bool scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
>
>  /* Create total_scalarization accesses for all scalar fields of a member
>     of type DECL_TYPE conforming to scalarizable_type_p.  BASE
>     must be the top-most VAR_DECL representing the variable; within that,
>     OFFSET locates the member and REF must be the memory reference expression for
> -   the member.  */
> +   the member.
>
> -static void
> +   Return TRUE if successful; may fail in the case of constant-pool entry BASE
> +   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
> +   COMPONENT_REF expressions.  */
> +
> +static bool
>  completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>  {
>    switch (TREE_CODE (decl_type))
> @@ -991,10 +1100,11 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>             tree ft = TREE_TYPE (fld);
>             tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE);
>
> -           scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), nref,
> -                           ft);
> +           if (!scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)),
> +                                nref, ft))
> +             return false;
>           }
> -      break;
> +      return true;
>      case ARRAY_TYPE:
>        {
>         tree elemtype = TREE_TYPE (decl_type);
> @@ -1026,12 +1136,13 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>                                     ref, size_int (idx + min),
>                                     NULL_TREE, NULL_TREE);
>                 int el_off = offset + idx * el_size;
> -               scalarize_elem (base, el_off, el_size, nref, elemtype);
> +               if (!scalarize_elem (base, el_off, el_size, nref, elemtype))
> +                 return false;
>               }
>             while (++idx <= lenlessone);
>           }
>        }
> -      break;
> +      return true;
>      default:
>        gcc_unreachable ();
>      }
> @@ -1040,22 +1151,32 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>  /* Create total_scalarization accesses for a member of type TYPE, which must
>     satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be the
>     top-most VAR_DECL representing the variable; within that, POS and SIZE locate
> -   the member and REF must be the reference expression for it.  */
> +   the member and REF must be the reference expression for it.
>
> -static void
> +   Return TRUE if successful; may fail in the case of constant-pool entry BASE
> +   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
> +   COMPONENT_REF expressions.  */
> +
> +static bool
>  scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
>                  tree ref, tree type)
>  {
>    if (is_gimple_reg_type (type))
>    {
> +    if (TREE_CODE (base) == VAR_DECL
> +       && DECL_IN_CONSTANT_POOL (base)
> +       && subst_constant_pool_initial (ref, base) == NULL_TREE)
> +      return false;
>      struct access *access = create_access_1 (base, pos, size);
>      access->expr = ref;
>      access->type = type;
>      access->grp_total_scalarization = 1;
>      /* Accesses for intraprocedural SRA can have their stmt NULL.  */
> +
> +    return true;
>    }
> -  else
> -    completely_scalarize (base, type, pos, ref);
> +
> +  return completely_scalarize (base, type, pos, ref);
>  }
>
>  /* Create a total_scalarization access for VAR as a whole.  VAR must be of a
> @@ -1843,7 +1964,12 @@ maybe_add_sra_candidate (tree var)
>        reject (var, "not aggregate");
>        return false;
>      }
> -  if (needs_to_live_in_memory (var))
> +  /* Allow constant-pool entries (that "need to live in memory")
> +     unless we are doing IPA SRA.  */
> +  if (needs_to_live_in_memory (var)
> +      && (sra_mode == SRA_MODE_EARLY_IPA
> +         || TREE_CODE (var) != VAR_DECL
> +         || !DECL_IN_CONSTANT_POOL (var)))
>      {
>        reject (var, "needs to live in memory");
>        return false;
> @@ -2343,6 +2469,10 @@ analyze_access_subtree (struct access *root, struct access *parent,
>                        (unsigned) root->offset, (unsigned) root->size);
>               fprintf (dump_file, " to an integer.\n");
>             }
> +         gcc_checking_assert (TREE_CODE (root->base) != VAR_DECL
> +                              || !DECL_IN_CONSTANT_POOL (root->base)
> +                              || subst_constant_pool_initial (root->expr,
> +                                                              root->base));
>         }
>
>        root->grp_to_be_replaced = 1;
> @@ -2604,8 +2734,17 @@ analyze_all_variable_accesses (void)
>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>                 <= max_scalarization_size)
>               {
> +               vec<access_p> &accesses = base_access_vec->get_or_insert (var);
> +               unsigned orig_count = accesses.length ();
> +               if (!completely_scalarize (var, TREE_TYPE (var), 0, var))
> +                 {
> +                   /* Failed.  Discard any accesses created during attempt.  */
> +                   for (unsigned i = orig_count; i < accesses.length (); i++)
> +                     access_pool.remove (accesses[i]);
> +                   accesses.truncate (orig_count);
> +                   continue;
> +                 }
>                 create_total_scalarization_access (var);
> -               completely_scalarize (var, TREE_TYPE (var), 0, var);
>                 if (dump_file && (dump_flags & TDF_DETAILS))
>                   {
>                     fprintf (dump_file, "Will attempt to totally scalarize ");
> @@ -3480,6 +3619,56 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>      }
>  }
>
> +static void
> +initialize_constant_pool_replacements (void)
> +{
> +  gimple_seq seq = NULL;
> +  gimple_stmt_iterator gsi = gsi_start (seq);
> +  bitmap_iterator bi;
> +  unsigned i;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
> +    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
> +       && !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
> +      {
> +       tree var = candidate (i);
> +       if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
> +         continue;
> +       vec<access_p> *access_vec = get_base_access_vector (var);
> +       if (!access_vec)
> +         continue;
> +       for (unsigned i = 0; i < access_vec->length (); i++)
> +         {
> +           struct access *access = (*access_vec)[i];
> +           tree val = subst_constant_pool_initial (access->expr, access->base);
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             {
> +               fprintf (dump_file, "Expression ");
> +               print_generic_expr (dump_file, access->expr, 0);
> +               fprintf (dump_file, " replaced with ");
> +               print_generic_expr (dump_file, access->replacement_decl, 0);
> +               fprintf (dump_file, " initialized to ");
> +               print_generic_expr (dump_file, val, 0);
> +               fprintf (dump_file, "\n");
> +               if (!access->replacement_decl)
> +                 dump_access (dump_file, access, true);
> +             }
> +           if (!access->replacement_decl)
> +             continue;
> +           gcc_assert (val);
> +           gassign *stmt = gimple_build_assign (
> +             get_access_replacement (access), val);
> +           gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> +           update_stmt (stmt);
> +         }
> +      }
> +
> +  seq = gsi_seq (gsi);
> +  if (seq)
> +    gsi_insert_seq_on_edge_immediate (
> +      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
> +}
> +
>  /* Traverse the function body and all modifications as decided in
>     analyze_all_variable_accesses.  Return true iff the CFG has been
>     changed.  */
> @@ -3490,6 +3679,8 @@ sra_modify_function_body (void)
>    bool cfg_changed = false;
>    basic_block bb;
>
> +  initialize_constant_pool_replacements ();
> +
>    FOR_EACH_BB_FN (bb, cfun)
>      {
>        gimple_stmt_iterator gsi = gsi_start_bb (bb);
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-10-29 19:32 ` [PATCH 6/6] Make SRA replace " Alan Lawrence
@ 2015-10-31 23:48   ` Bernhard Reutner-Fischer
  2015-11-03 14:01   ` Richard Biener
  1 sibling, 0 replies; 8+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-10-31 23:48 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches

On October 29, 2015 8:18:22 PM GMT+01:00, Alan Lawrence <alan.lawrence@arm.com> wrote:

>+
>+static tree
>+subst_constant_pool_initial (tree expr, tree var)
>+{
>+  if (TREE_CODE (expr) == VAR_DECL)

Just a nit, but i thought we had VAR_DECL_P or VAR_P for the TREE_CODE (NODE) == VAR_DECL predicate?

Thanks,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 6/6] Make SRA replace constant-pool loads
  2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize " Alan Lawrence
@ 2015-10-29 19:32 ` Alan Lawrence
  2015-10-31 23:48   ` Bernhard Reutner-Fischer
  2015-11-03 14:01   ` Richard Biener
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Lawrence @ 2015-10-29 19:32 UTC (permalink / raw)
  To: gcc-patches

This has changed quite a bit since the previous revision
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
and specifically Ada on ARM.

I didn't find a good alternative to scanning for constant-pool accesses "as we
go" through the function, and although I didn't find any of these accesses
being disqualified after we'd found them in C, some awkward constant-pool
expressions in Ada forced me to disqualify them for a new reason (inability to
constant-propagate). Hence, I introduced a new bitmap of
disqualified_constants, rather than an additional pass through the function.

Next, the approach of using a replacement_expr (with tho constant value) in
place of the old replacement_decl, also failed (the decl could be used in an
expression where substituting in the expr produced ill-formed gimple). Hence,
constant-pool entries now use replacement_decls just like everything else, for
which I generate initializers in initialize_constant_pool_replacements.

However, I was able to avoid generating the replacement constants early, and to
do so late in initialize_constant_pool_replacements; the main trick here was to
use fold_array_ctor_reference, kindly pointed out by Richie :), to fold the
MEM_REFs built in analyze_access_subtree.

Finally, I found completely_scalarize was still failing to fold all the
constant expressions, because Ada was putting VIEW_CONVERT_EXPRs in the
constant pool, and fold-const.c did not deal with ARRAY_REFs of these. However,
requiring completely_scalarize to be able to fold everything, seemed fragile,
instead I decoupled from fold-const by allowing to fail.

With these changes, ssa-dom-cse-2.c is fixed on all platforms with appropriate
--param.

There are still a few remaining test failures, on AArch64:
gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
(also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os).
...which I'm working on. I also tested with the hack at https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html . This revealed one new failure in advsimd-intrinsics/vldX.c on AArch64 (fixed by https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02777.html), and fixed a bunch of guality failures on ARM:
gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
(also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os)

--Alan

gcc/ChangeLog:

	* tree-sra.c (disqualified_constants): New.
	(sra_initialize): Initialize it.
	(sra_deinitialize): Deallocate it.
	(disqualify_candidate): Set bit in disqualified_constants.
	(subst_constant_pool_initial): New.
	(create_access): Scan for constant-pool entries as we go.
	(scalarizable_type_p): Disallow types containing placeholders.
	(completely_scalarize): Return bool to allow failure.
	(scalarize_elem): Likewise; check we can generate constant replacements.
	(maybe_add_sra_candidate): Allow constant-pool entries.
	(analyze_access_subtree): Checking-assert that we can fold any refs
	built for constant-pool entries.
	(analyze_all_variable_accesses): Deal with completely_scalarize failing.
	(initialize_constant_pool_replacements): New.
	(sra_modify_function_body): Call previous.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add sra-max-scalarization-size param,
	remove xfail.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |   7 +-
 gcc/tree-sra.c                                | 221 ++++++++++++++++++++++++--
 2 files changed, 208 insertions(+), 20 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 9eccdc9..b13d583 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
 
 int
 foo ()
@@ -17,7 +17,4 @@ foo ()
 /* After late unrolling the above loop completely DOM should be
    able to optimize this to return 28.  */
 
-/* See PR63679 and PR64159, if the target forces the initializer to memory then
-   DOM is not able to perform this optimization.  */
-
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 358db79..1920265 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -90,6 +90,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimplify-me.h"
 #include "gimple-walk.h"
+#include "gimple-fold.h"
 #include "tree-cfg.h"
 #include "flags.h"
 #include "insn-config.h"
@@ -336,6 +337,10 @@ candidate (unsigned uid)
    those which cannot be (because they are and need be used as a whole).  */
 static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
 
+/* Bitmap of candidates in the constant pool, which cannot be scalarized
+   because this would produce non-constant expressions (e.g. Ada).  */
+static bitmap disqualified_constants;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -658,6 +663,7 @@ sra_initialize (void)
     (vec_safe_length (cfun->local_decls) / 2);
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
+  disqualified_constants = BITMAP_ALLOC (NULL);
   gcc_obstack_init (&name_obstack);
   base_access_vec = new hash_map<tree, auto_vec<access_p> >;
   memset (&sra_stats, 0, sizeof (sra_stats));
@@ -676,6 +682,7 @@ sra_deinitialize (void)
   candidates = NULL;
   BITMAP_FREE (should_scalarize_away_bitmap);
   BITMAP_FREE (cannot_scalarize_away_bitmap);
+  BITMAP_FREE (disqualified_constants);
   access_pool.release ();
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);
@@ -690,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
 {
   if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
     candidates->remove_elt_with_hash (decl, DECL_UID (decl));
+  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
+    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -841,8 +850,76 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
   return access;
 }
 
+/* Given EXPR a chain of COMPONENT_REFs and/or ARRAY_REFs built around a
+   constant-pool declaration VAR, builds a tree like EXPR but with VAR replaced
+   by its DECL_INITIAL, and tries to fold it to a constant.  If folding succeeds
+   then return the folded tree, else NULL_TREE.  */
+
+static tree
+subst_constant_pool_initial (tree expr, tree var)
+{
+  if (TREE_CODE (expr) == VAR_DECL)
+    {
+      gcc_assert (DECL_IN_CONSTANT_POOL (expr));
+      gcc_assert (expr == var);
+      return DECL_INITIAL (expr);
+    }
+  if (TREE_CODE (expr) == COMPONENT_REF)
+    {
+      tree field = TREE_OPERAND (expr, 1);
+      gcc_assert (TREE_CODE (field) == FIELD_DECL);
+      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+      tree record = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
+      if (!record)
+	return NULL_TREE;
+      tree ref = build3 (COMPONENT_REF, TREE_TYPE (expr),
+			 record, field, NULL_TREE);
+      tree res = fold (ref);
+      if (res != ref)
+	return res;
+    }
+  else if (TREE_CODE (expr) == ARRAY_REF)
+    {
+      tree array = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
+      if (!array)
+	return NULL_TREE;
+      tree ref = build4 (ARRAY_REF, TREE_TYPE (expr),
+			 array,
+			 TREE_OPERAND (expr, 1),
+			 TREE_OPERAND (expr, 2),
+			 TREE_OPERAND (expr, 3));
+      /* Ada (at least) builds ARRAY_REFs around strings.  */
+      if (tree folded = fold_read_from_constant_string (ref))
+	return folded;
+      tree res = fold (ref);
+      if (res != ref)
+	return res;
+    }
+  else if (TREE_CODE (expr) == MEM_REF)
+    {
+      gcc_assert (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR);
+      tree type = TREE_TYPE (expr);
+      tree sz = TYPE_SIZE (TREE_TYPE (expr));
+      tree val = TREE_OPERAND (TREE_OPERAND (expr, 0), 0);
+      val = subst_constant_pool_initial (val, var);
+      if (TREE_CODE (val) != CONSTRUCTOR
+	  || !tree_fits_uhwi_p (TREE_OPERAND (expr, 1))
+	  || !tree_fits_uhwi_p (sz))
+	return NULL_TREE;
+      unsigned HOST_WIDE_INT offset =
+	  tree_to_uhwi (TREE_OPERAND (expr, 1)) * BITS_PER_UNIT;
+      return fold_ctor_reference (type, val, offset, tree_to_uhwi (sz), var);
+    }
+
+  /* Unknown expression, or could not constant-fold.  */
+  return NULL_TREE;
+}
+
+static bool maybe_add_sra_candidate (tree var);
+
 /* Create and insert access for EXPR. Return created access, or NULL if it is
-   not possible.  */
+   not possible.  Also scan for uses of constant pool as we go along and add
+   to candidates.  */
 
 static struct access *
 create_access (tree expr, gimple *stmt, bool write)
@@ -865,6 +942,32 @@ create_access (tree expr, gimple *stmt, bool write)
   else
     ptr = false;
 
+  /* For constant-pool entries, check we can substitute the constant value.  */
+  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
+      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
+    {
+      gcc_assert (!write);
+      if (bitmap_bit_p (disqualified_constants, DECL_UID (base)))
+	return NULL;
+      tree repl = subst_constant_pool_initial (expr, base);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+      {
+	fprintf (dump_file, "Trying to scalarize constant-pool ");
+	print_generic_expr (dump_file, base, 0);
+	fprintf (dump_file, "\n In expression: ");
+	print_generic_expr (dump_file, expr, 0);
+	fprintf (dump_file, "\n Constant replacement: ");
+	print_generic_expr (dump_file, repl, 0);
+	fprintf (dump_file, "\n");
+      }
+      /* If we can't resolve the expression to a constant, we risk creating
+	  a malformed expression (e.g. Ada testsuite, ACATS chapter C3).  */
+      if (!repl || !TREE_CONSTANT (repl))
+	disqualify_candidate (base, "Cannot constant-propagate out of pool.");
+      else
+	maybe_add_sra_candidate (base);
+    }
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
     return NULL;
 
@@ -923,6 +1026,8 @@ static bool
 scalarizable_type_p (tree type)
 {
   gcc_assert (!is_gimple_reg_type (type));
+  if (type_contains_placeholder_p (type))
+    return false;
 
   switch (TREE_CODE (type))
   {
@@ -970,15 +1075,19 @@ scalarizable_type_p (tree type)
   }
 }
 
-static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
+static bool scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
 
 /* Create total_scalarization accesses for all scalar fields of a member
    of type DECL_TYPE conforming to scalarizable_type_p.  BASE
    must be the top-most VAR_DECL representing the variable; within that,
    OFFSET locates the member and REF must be the memory reference expression for
-   the member.  */
+   the member.
 
-static void
+   Return TRUE if successful; may fail in the case of constant-pool entry BASE
+   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
+   COMPONENT_REF expressions.  */
+
+static bool
 completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 {
   switch (TREE_CODE (decl_type))
@@ -991,10 +1100,11 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 	    tree ft = TREE_TYPE (fld);
 	    tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE);
 
-	    scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), nref,
-			    ft);
+	    if (!scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)),
+				 nref, ft))
+	      return false;
 	  }
-      break;
+      return true;
     case ARRAY_TYPE:
       {
 	tree elemtype = TREE_TYPE (decl_type);
@@ -1026,12 +1136,13 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 				    ref, size_int (idx + min),
 				    NULL_TREE, NULL_TREE);
 		int el_off = offset + idx * el_size;
-		scalarize_elem (base, el_off, el_size, nref, elemtype);
+		if (!scalarize_elem (base, el_off, el_size, nref, elemtype))
+		  return false;
 	      }
 	    while (++idx <= lenlessone);
 	  }
       }
-      break;
+      return true;
     default:
       gcc_unreachable ();
     }
@@ -1040,22 +1151,32 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 /* Create total_scalarization accesses for a member of type TYPE, which must
    satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be the
    top-most VAR_DECL representing the variable; within that, POS and SIZE locate
-   the member and REF must be the reference expression for it.  */
+   the member and REF must be the reference expression for it.
 
-static void
+   Return TRUE if successful; may fail in the case of constant-pool entry BASE
+   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
+   COMPONENT_REF expressions.  */
+
+static bool
 scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
 		 tree ref, tree type)
 {
   if (is_gimple_reg_type (type))
   {
+    if (TREE_CODE (base) == VAR_DECL
+	&& DECL_IN_CONSTANT_POOL (base)
+	&& subst_constant_pool_initial (ref, base) == NULL_TREE)
+      return false;
     struct access *access = create_access_1 (base, pos, size);
     access->expr = ref;
     access->type = type;
     access->grp_total_scalarization = 1;
     /* Accesses for intraprocedural SRA can have their stmt NULL.  */
+
+    return true;
   }
-  else
-    completely_scalarize (base, type, pos, ref);
+
+  return completely_scalarize (base, type, pos, ref);
 }
 
 /* Create a total_scalarization access for VAR as a whole.  VAR must be of a
@@ -1843,7 +1964,12 @@ maybe_add_sra_candidate (tree var)
       reject (var, "not aggregate");
       return false;
     }
-  if (needs_to_live_in_memory (var))
+  /* Allow constant-pool entries (that "need to live in memory")
+     unless we are doing IPA SRA.  */
+  if (needs_to_live_in_memory (var)
+      && (sra_mode == SRA_MODE_EARLY_IPA
+	  || TREE_CODE (var) != VAR_DECL
+	  || !DECL_IN_CONSTANT_POOL (var)))
     {
       reject (var, "needs to live in memory");
       return false;
@@ -2343,6 +2469,10 @@ analyze_access_subtree (struct access *root, struct access *parent,
 		       (unsigned) root->offset, (unsigned) root->size);
 	      fprintf (dump_file, " to an integer.\n");
 	    }
+	  gcc_checking_assert (TREE_CODE (root->base) != VAR_DECL
+			       || !DECL_IN_CONSTANT_POOL (root->base)
+			       || subst_constant_pool_initial (root->expr,
+							       root->base));
 	}
 
       root->grp_to_be_replaced = 1;
@@ -2604,8 +2734,17 @@ analyze_all_variable_accesses (void)
 	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
 		<= max_scalarization_size)
 	      {
+		vec<access_p> &accesses = base_access_vec->get_or_insert (var);
+		unsigned orig_count = accesses.length ();
+		if (!completely_scalarize (var, TREE_TYPE (var), 0, var))
+		  {
+		    /* Failed.  Discard any accesses created during attempt.  */
+		    for (unsigned i = orig_count; i < accesses.length (); i++)
+		      access_pool.remove (accesses[i]);
+		    accesses.truncate (orig_count);
+		    continue;
+		  }
 		create_total_scalarization_access (var);
-		completely_scalarize (var, TREE_TYPE (var), 0, var);
 		if (dump_file && (dump_flags & TDF_DETAILS))
 		  {
 		    fprintf (dump_file, "Will attempt to totally scalarize ");
@@ -3480,6 +3619,56 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
     }
 }
 
+static void
+initialize_constant_pool_replacements (void)
+{
+  gimple_seq seq = NULL;
+  gimple_stmt_iterator gsi = gsi_start (seq);
+  bitmap_iterator bi;
+  unsigned i;
+
+  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
+    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
+	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
+      {
+	tree var = candidate (i);
+	if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
+	  continue;
+	vec<access_p> *access_vec = get_base_access_vector (var);
+	if (!access_vec)
+	  continue;
+	for (unsigned i = 0; i < access_vec->length (); i++)
+	  {
+	    struct access *access = (*access_vec)[i];
+	    tree val = subst_constant_pool_initial (access->expr, access->base);
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Expression ");
+		print_generic_expr (dump_file, access->expr, 0);
+		fprintf (dump_file, " replaced with ");
+		print_generic_expr (dump_file, access->replacement_decl, 0);
+		fprintf (dump_file, " initialized to ");
+		print_generic_expr (dump_file, val, 0);
+		fprintf (dump_file, "\n");
+		if (!access->replacement_decl)
+		  dump_access (dump_file, access, true);
+	      }
+	    if (!access->replacement_decl)
+	      continue;
+	    gcc_assert (val);
+	    gassign *stmt = gimple_build_assign (
+	      get_access_replacement (access), val);
+	    gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	    update_stmt (stmt);
+	  }
+      }
+
+  seq = gsi_seq (gsi);
+  if (seq)
+    gsi_insert_seq_on_edge_immediate (
+      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
+}
+
 /* Traverse the function body and all modifications as decided in
    analyze_all_variable_accesses.  Return true iff the CFG has been
    changed.  */
@@ -3490,6 +3679,8 @@ sra_modify_function_body (void)
   bool cfg_changed = false;
   basic_block bb;
 
+  initialize_constant_pool_replacements ();
+
   FOR_EACH_BB_FN (bb, cfun)
     {
       gimple_stmt_iterator gsi = gsi_start_bb (bb);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-11-16 13:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <=<F9F317CB-3AA4-4583-95D6-72A3E3ABED2F@gmail.com>
2015-11-12 18:36 ` [PATCH 6/6] Make SRA replace constant-pool loads Alan Lawrence
2015-11-16 13:44   ` Richard Biener
2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize " Alan Lawrence
2015-10-29 19:32 ` [PATCH 6/6] Make SRA replace " Alan Lawrence
2015-10-31 23:48   ` Bernhard Reutner-Fischer
2015-11-03 14:01   ` Richard Biener
2015-11-05 18:55     ` Alan Lawrence
2015-11-06 10:25       ` Eric Botcazou
2015-11-06 16:29       ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).