public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF
  2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize constant-pool loads Alan Lawrence
@ 2015-10-29 19:20 ` Alan Lawrence
  2015-10-30  5:39   ` Jeff Law
  2015-10-29 19:21 ` [PATCH 2/6] tree-ssa-dom.c: Normalize data types in MEM_REFs Alan Lawrence
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Alan Lawrence @ 2015-10-29 19:20 UTC (permalink / raw)
  To: gcc-patches

This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs (with
pointer type to the array element type).

gcc/ChangeLog:

	* tree-ssa-dom.c (dom_normalize_single_rhs): New.
	(dom_normalize_gimple_stmt): New.
	(lookup_avail_expr): Call dom_normalize_gimple_stmt.
---
 gcc/tree-ssa-dom.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 38cceff..fca8a80 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1897,6 +1897,72 @@ vuse_eq (ao_ref *, tree vuse1, unsigned int cnt, void *data)
   return NULL;
 }
 
+/* Return the normal form of EXPR, the gimple_assign_rhs1 of a GIMPLE_SINGLE_RHS
+   assignment, or NULL_TREE if EXPR is already in normal form.
+   The normal form is such that if two expressions normalize to trees that are
+   operand_equal_p, then dominator optimizations can replace one by the value
+   produced by the other.
+   It is not necessary that expressions with equal canonical forms are
+   equivalent in all other situations, e.g. aliasing.  */
+
+static tree
+dom_normalize_single_rhs (tree expr)
+{
+  switch (TREE_CODE (expr))
+  {
+    case ARRAY_REF:
+    {
+      tree index = TREE_OPERAND (expr, 1);
+      tree low_bound = array_ref_low_bound (expr);
+      if (!TREE_CONSTANT (index) || !TREE_CONSTANT (low_bound))
+	return NULL_TREE;
+
+      if (! integer_zerop (low_bound))
+	index = fold_build2 (MINUS_EXPR, TREE_TYPE (index), index, low_bound);
+
+      tree esz = array_ref_element_size (expr);
+      gcc_assert (TREE_CONSTANT (esz));
+      tree offset = fold_build2 (MULT_EXPR, sizetype, index, esz);
+
+      offset = fold_convert (build_pointer_type (TREE_TYPE (expr)), offset);
+
+      tree base = TREE_OPERAND (expr, 0);
+      base = build_fold_addr_expr (base);
+
+      return fold_build2 (MEM_REF, TREE_TYPE (expr), base, offset);
+    }
+    default:
+      return NULL_TREE;
+  }
+}
+
+/* Return the normal form of STMT, or STMT if it is already normalized.  */
+
+static gimple *
+dom_normalize_gimple_stmt (gimple *stmt)
+{
+  if (gimple_code (stmt) == GIMPLE_ASSIGN)
+    {
+      enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+      if (get_gimple_rhs_class (rhs_code) == GIMPLE_SINGLE_RHS)
+	if (tree nrhs = dom_normalize_single_rhs (gimple_assign_rhs1 (stmt)))
+	  {
+	    tree lhs = gimple_assign_lhs (stmt);
+	    /* Exchanged statements, which are never part of the CFG,
+	       can have invalid LHS.  */
+	    gimple *defstmt = 0;
+	    if (TREE_CODE (lhs) == SSA_NAME)
+	      defstmt = SSA_NAME_DEF_STMT (lhs);
+	    gassign *new_stmt = gimple_build_assign (lhs, nrhs);
+	    if (TREE_CODE (lhs) == SSA_NAME)
+	      SSA_NAME_DEF_STMT (lhs) = defstmt;
+	    gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+	    return new_stmt;
+	  }
+    }
+  return stmt;
+}
+
 /* Search for an existing instance of STMT in the AVAIL_EXPRS_STACK table.
    If found, return its LHS. Otherwise insert STMT in the table and
    return NULL_TREE.
@@ -1918,7 +1984,7 @@ lookup_avail_expr (gimple *stmt, bool insert,
   else
     lhs = gimple_get_lhs (stmt);
 
-  class expr_hash_elt element (stmt, lhs);
+  class expr_hash_elt element (dom_normalize_gimple_stmt (stmt), lhs);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/6] tree-ssa-dom.c: Normalize data types in MEM_REFs.
  2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize constant-pool loads Alan Lawrence
  2015-10-29 19:20 ` [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF Alan Lawrence
@ 2015-10-29 19:21 ` Alan Lawrence
  2015-10-29 19:22 ` [PATCH 3/6] Share code from fold_array_ctor_reference with fold Alan Lawrence
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Alan Lawrence @ 2015-10-29 19:21 UTC (permalink / raw)
  To: gcc-patches

This makes dom2 identify e.g. MEM[(int[8] *)...] with MEM[(int *)...].
These are not generally equivalent as they have different aliasing behaviour
but they have the same value as far as dom is concerned and so this helps
find more equivalences.

There is some question over the best policy here, but using the result type of
the MEM_REF seemed the most normalizing. Of course this will not catch e.g.
e.g. reading a V2SI vector from a block of memory which was written as
two separate SImode writes; but fixing that will require a much heavier-weight
approach.

gcc/ChangeLog:

	* tree-ssa-dom.c (dom_normalize_single_rhs): Add case for MEM_REFs.
---
 gcc/tree-ssa-dom.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index fca8a80..d74fb70 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1931,6 +1931,28 @@ dom_normalize_single_rhs (tree expr)
 
       return fold_build2 (MEM_REF, TREE_TYPE (expr), base, offset);
     }
+    case MEM_REF:
+    {
+      /* Make the target of the pointer equal to the result type of the MEM_REF.
+	 (This is not generally equivalent as it has different aliasing
+	 properties but has the same value and so finds more equivalences.)  */
+
+      tree off = TREE_OPERAND (expr, 1);
+
+      gcc_assert (POINTER_TYPE_P (TREE_TYPE (off)));
+      tree ptr_target_type = TREE_TYPE (TREE_TYPE (off));
+
+      if (ptr_target_type == TREE_TYPE (expr))
+	return NULL_TREE;
+
+      tree nref = copy_node (expr);
+      TREE_OPERAND (nref, 1) = copy_node (off);
+
+      TREE_TYPE (TREE_OPERAND (nref, 1)) =
+	build_pointer_type (TREE_TYPE (expr));
+
+      return nref;
+    }
     default:
       return NULL_TREE;
   }
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 4/6][Trivial] tree-sra.c: A few comment fixes/additions.
  2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize constant-pool loads Alan Lawrence
                   ` (2 preceding siblings ...)
  2015-10-29 19:22 ` [PATCH 3/6] Share code from fold_array_ctor_reference with fold Alan Lawrence
@ 2015-10-29 19:22 ` Alan Lawrence
  2015-10-30  5:35   ` Jeff Law
  2015-10-29 19:22 ` [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices Alan Lawrence
  2015-10-29 19:32 ` [PATCH 6/6] Make SRA replace constant-pool loads Alan Lawrence
  5 siblings, 1 reply; 28+ messages in thread
From: Alan Lawrence @ 2015-10-29 19:22 UTC (permalink / raw)
  To: gcc-patches

gcc/ChangeLog:

	* tree-sra.c (scalarizable_type_p): Comment variable-length arrays.
	(completely_scalarize): Comment zero-length arrays.
	(get_access_replacement): Correct comment re. precondition.
---
 gcc/tree-sra.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index f851758..e15df1f 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -956,6 +956,7 @@ scalarizable_type_p (tree type)
 	;
       else if ((tree_to_shwi (TYPE_SIZE (type)) <= 0)
 	       || !tree_fits_shwi_p (TYPE_MAX_VALUE (TYPE_DOMAIN (type))))
+	/* Variable-length array, do not allow scalarization.  */
 	return false;
 
       tree elem = TREE_TYPE (type);
@@ -1005,6 +1006,7 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 	tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
 	gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
 	tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
+	/* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1.  */
 	if (maxidx)
 	  {
 	    gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
@@ -2146,7 +2148,7 @@ create_access_replacement (struct access *access)
   return repl;
 }
 
-/* Return ACCESS scalar replacement, create it if it does not exist yet.  */
+/* Return ACCESS scalar replacement, which must exist.  */
 
 static inline tree
 get_access_replacement (struct access *access)
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/6 v2] PR/63679 Make SRA scalarize constant-pool loads
@ 2015-10-29 19:22 Alan Lawrence
  2015-10-29 19:20 ` [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF Alan Lawrence
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Alan Lawrence @ 2015-10-29 19:22 UTC (permalink / raw)
  To: gcc-patches

This is a revision of previous series at
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01485.html , and follows on from
the first two patches of that series, which have been pushed already.

A few things have happened since. The previous patch 3, making SRA generate
ARRAY_REFS, is removed. As Martin comments, it seems unfair to make SRA work
around this limitation of DOM, so instead the first two patches extend DOM to
understand this equivalence. Richi suggested that we might want a normalization
pass to convert all ARRAY_REFs to MEM_REFS, and while that would have
advantages, this simpler approach is enough to persuade DOM to optimize the
MEM_REFS produced by SRA with array accesses, even when the vectorizer
intervenes. Moreover we can apply more normalizations in DOM than would be
permissible elsewhere, even if these two patches are only a small step towards
that.

A number of other changes followed, with most problems coming from Ada, specifically on ARM.

I intend to follow up with changes to the heuristics and gimplification code,
moving heuristics from gimplify_init_constructor to SRA, but this series as it
is fixes PR/63679 (with the --param), and it is not certain I will be able to
tackle said heuristics in time for gcc 6.

I've tested each patch (in sequence supplied) with

Bootstrap + check-gcc,g++,ada,fortran on x86_64 and ARM;
Bootstrap + check-gcc,g++,fortran on AArch64.

And also tested that the ssa-dom-cse-2.c scan-tree-dump test is fixed on hppa, ppc64le, sparc, alpha, s390.

Are these OK for trunk?

Cheers, Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 3/6] Share code from fold_array_ctor_reference with fold.
  2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize constant-pool loads Alan Lawrence
  2015-10-29 19:20 ` [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF Alan Lawrence
  2015-10-29 19:21 ` [PATCH 2/6] tree-ssa-dom.c: Normalize data types in MEM_REFs Alan Lawrence
@ 2015-10-29 19:22 ` Alan Lawrence
  2015-10-30  9:25   ` Richard Biener
  2015-11-03  3:20   ` Jeff Law
  2015-10-29 19:22 ` [PATCH 4/6][Trivial] tree-sra.c: A few comment fixes/additions Alan Lawrence
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Alan Lawrence @ 2015-10-29 19:22 UTC (permalink / raw)
  To: gcc-patches

This is in response to https://gcc.gnu.org/ml/gcc/2015-10/msg00097.html, where
Richi points out that CONSTRUCTOR elements are not necessarily ordered.

I wasn't sure of a good naming convention for the new get_ctor_element_at_index,
other suggestions welcome.

gcc/ChangeLog:

	* gimple-fold.c (fold_array_ctor_reference): Move searching code to:
	* fold-const.c (get_ctor_element_at_index): New.
	(fold): Remove binary-search through CONSTRUCTOR, call previous.

	* fold-const.h (get_ctor_element_at_index): New.
---
 gcc/fold-const.c  | 93 ++++++++++++++++++++++++++++++++++++++++---------------
 gcc/fold-const.h  |  1 +
 gcc/gimple-fold.c | 47 ++--------------------------
 3 files changed, 72 insertions(+), 69 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index de45a2c..5d27b91 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -12018,6 +12018,72 @@ fold_ternary_loc (location_t loc, enum tree_code code, tree type,
     } /* switch (code) */
 }
 
+/* Gets the element ACCESS_INDEX from CTOR, which must be a CONSTRUCTOR.  */
+
+tree
+get_ctor_element_at_index (tree ctor, offset_int access_index)
+{
+  tree index_type = NULL_TREE;
+  offset_int low_bound = 0;
+
+  if (TREE_CODE (TREE_TYPE (ctor)) == ARRAY_TYPE)
+  {
+    tree domain_type = TYPE_DOMAIN (TREE_TYPE (ctor));
+    if (domain_type && TYPE_MIN_VALUE (domain_type))
+    {
+      /* Static constructors for variably sized objects makes no sense.  */
+      gcc_assert (TREE_CODE (TYPE_MIN_VALUE (domain_type)) == INTEGER_CST);
+      index_type = TREE_TYPE (TYPE_MIN_VALUE (domain_type));
+      low_bound = wi::to_offset (TYPE_MIN_VALUE (domain_type));
+    }
+  }
+
+  if (index_type)
+    access_index = wi::ext (access_index, TYPE_PRECISION (index_type),
+			    TYPE_SIGN (index_type));
+
+  offset_int index = low_bound - 1;
+  if (index_type)
+    index = wi::ext (index, TYPE_PRECISION (index_type),
+		     TYPE_SIGN (index_type));
+
+  offset_int max_index;
+  unsigned HOST_WIDE_INT cnt;
+  tree cfield, cval;
+
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield, cval)
+  {
+    /* Array constructor might explicitely set index, or specify range
+     * or leave index NULL meaning that it is next index after previous
+     * one.  */
+    if (cfield)
+    {
+      if (TREE_CODE (cfield) == INTEGER_CST)
+	max_index = index = wi::to_offset (cfield);
+      else
+      {
+	gcc_assert (TREE_CODE (cfield) == RANGE_EXPR);
+	index = wi::to_offset (TREE_OPERAND (cfield, 0));
+	max_index = wi::to_offset (TREE_OPERAND (cfield, 1));
+      }
+    }
+    else
+    {
+      index += 1;
+      if (index_type)
+	index = wi::ext (index, TYPE_PRECISION (index_type),
+			 TYPE_SIGN (index_type));
+	max_index = index;
+    }
+
+    /* Do we have match?  */
+    if (wi::cmpu (access_index, index) >= 0
+	&& wi::cmpu (access_index, max_index) <= 0)
+      return cval;
+  }
+  return NULL_TREE;
+}
+
 /* Perform constant folding and related simplification of EXPR.
    The related simplifications include x*1 => x, x*0 => 0, etc.,
    and application of the associative law.
@@ -12094,31 +12160,8 @@ fold (tree expr)
 	    && TREE_CODE (op0) == CONSTRUCTOR
 	    && ! type_contains_placeholder_p (TREE_TYPE (op0)))
 	  {
-	    vec<constructor_elt, va_gc> *elts = CONSTRUCTOR_ELTS (op0);
-	    unsigned HOST_WIDE_INT end = vec_safe_length (elts);
-	    unsigned HOST_WIDE_INT begin = 0;
-
-	    /* Find a matching index by means of a binary search.  */
-	    while (begin != end)
-	      {
-		unsigned HOST_WIDE_INT middle = (begin + end) / 2;
-		tree index = (*elts)[middle].index;
-
-		if (TREE_CODE (index) == INTEGER_CST
-		    && tree_int_cst_lt (index, op1))
-		  begin = middle + 1;
-		else if (TREE_CODE (index) == INTEGER_CST
-			 && tree_int_cst_lt (op1, index))
-		  end = middle;
-		else if (TREE_CODE (index) == RANGE_EXPR
-			 && tree_int_cst_lt (TREE_OPERAND (index, 1), op1))
-		  begin = middle + 1;
-		else if (TREE_CODE (index) == RANGE_EXPR
-			 && tree_int_cst_lt (op1, TREE_OPERAND (index, 0)))
-		  end = middle;
-		else
-		  return (*elts)[middle].value;
-	      }
+	    if (tree val = get_ctor_element_at_index (op0, wi::to_offset (op1)))
+	      return val;
 	  }
 
 	return t;
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index ee74dc8..1fa6ee0 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -73,6 +73,7 @@ extern tree fold_build_call_array_loc (location_t, tree, tree, int, tree *);
 #define fold_build_call_array_initializer(T1,T2,N,T4)\
    fold_build_call_array_initializer_loc (UNKNOWN_LOCATION, T1, T2, N, T4)
 extern tree fold_build_call_array_initializer_loc (location_t, tree, tree, int, tree *);
+extern tree get_ctor_element_at_index (tree, offset_int);
 extern bool fold_convertible_p (const_tree, const_tree);
 #define fold_convert(T1,T2)\
    fold_convert_loc (UNKNOWN_LOCATION, T1, T2)
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 85ff018..823d952 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -5308,13 +5308,10 @@ fold_array_ctor_reference (tree type, tree ctor,
 			   unsigned HOST_WIDE_INT size,
 			   tree from_decl)
 {
-  unsigned HOST_WIDE_INT cnt;
-  tree cfield, cval;
   offset_int low_bound;
   offset_int elt_size;
-  offset_int index, max_index;
   offset_int access_index;
-  tree domain_type = NULL_TREE, index_type = NULL_TREE;
+  tree domain_type = NULL_TREE;
   HOST_WIDE_INT inner_offset;
 
   /* Compute low bound and elt size.  */
@@ -5324,7 +5321,6 @@ fold_array_ctor_reference (tree type, tree ctor,
     {
       /* Static constructors for variably sized objects makes no sense.  */
       gcc_assert (TREE_CODE (TYPE_MIN_VALUE (domain_type)) == INTEGER_CST);
-      index_type = TREE_TYPE (TYPE_MIN_VALUE (domain_type));
       low_bound = wi::to_offset (TYPE_MIN_VALUE (domain_type));
     }
   else
@@ -5346,9 +5342,6 @@ fold_array_ctor_reference (tree type, tree ctor,
   access_index = wi::udiv_trunc (offset_int (offset / BITS_PER_UNIT),
 				 elt_size);
   access_index += low_bound;
-  if (index_type)
-    access_index = wi::ext (access_index, TYPE_PRECISION (index_type),
-			    TYPE_SIGN (index_type));
 
   /* And offset within the access.  */
   inner_offset = offset % (elt_size.to_uhwi () * BITS_PER_UNIT);
@@ -5357,43 +5350,9 @@ fold_array_ctor_reference (tree type, tree ctor,
      care to fold accesses spanning multiple array indexes.  */
   if (inner_offset + size > elt_size.to_uhwi () * BITS_PER_UNIT)
     return NULL_TREE;
+  if (tree val = get_ctor_element_at_index (ctor, access_index))
+    return fold_ctor_reference (type, val, inner_offset, size, from_decl);
 
-  index = low_bound - 1;
-  if (index_type)
-    index = wi::ext (index, TYPE_PRECISION (index_type),
-		     TYPE_SIGN (index_type));
-
-  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield, cval)
-    {
-      /* Array constructor might explicitely set index, or specify range
-	 or leave index NULL meaning that it is next index after previous
-	 one.  */
-      if (cfield)
-	{
-	  if (TREE_CODE (cfield) == INTEGER_CST)
-	    max_index = index = wi::to_offset (cfield);
-	  else
-	    {
-	      gcc_assert (TREE_CODE (cfield) == RANGE_EXPR);
-	      index = wi::to_offset (TREE_OPERAND (cfield, 0));
-	      max_index = wi::to_offset (TREE_OPERAND (cfield, 1));
-	    }
-	}
-      else
-	{
-	  index += 1;
-	  if (index_type)
-	    index = wi::ext (index, TYPE_PRECISION (index_type),
-			     TYPE_SIGN (index_type));
-	  max_index = index;
-	}
-
-      /* Do we have match?  */
-      if (wi::cmpu (access_index, index) >= 0
-	  && wi::cmpu (access_index, max_index) <= 0)
-	return fold_ctor_reference (type, cval, inner_offset, size,
-				    from_decl);
-    }
   /* When memory is not explicitely mentioned in constructor,
      it is 0 (or out of range).  */
   return build_zero_cst (type);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices
  2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize constant-pool loads Alan Lawrence
                   ` (3 preceding siblings ...)
  2015-10-29 19:22 ` [PATCH 4/6][Trivial] tree-sra.c: A few comment fixes/additions Alan Lawrence
@ 2015-10-29 19:22 ` Alan Lawrence
  2015-10-30 10:54   ` Richard Biener
  2015-10-29 19:32 ` [PATCH 6/6] Make SRA replace constant-pool loads Alan Lawrence
  5 siblings, 1 reply; 28+ messages in thread
From: Alan Lawrence @ 2015-10-29 19:22 UTC (permalink / raw)
  To: gcc-patches

The code I added to completely_scalarize for arrays isn't right in some cases
of negative array indices (e.g. arrays with indices from -1 to 1 in the Ada
testsuite). On ARM, this prevents a failure bootstrapping Ada with the next
patch, as well as a few ACATS tests (e.g. c64106a).

Some discussion here: https://gcc.gnu.org/ml/gcc/2015-10/msg00096.html .

gcc/ChangeLog:

	* tree-sra.c (completely_scalarize): Deal properly with negative array
	indices.
---
 gcc/tree-sra.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index e15df1f..358db79 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1010,18 +1010,25 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 	if (maxidx)
 	  {
 	    gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
-	    /* MINIDX and MAXIDX are inclusive.  Try to avoid overflow.  */
-	    unsigned HOST_WIDE_INT lenp1 = tree_to_shwi (maxidx)
-					- tree_to_shwi (minidx);
+	    /* MINIDX and MAXIDX are inclusive, and must be interpreted in the
+	       TYPE_DOMAIN (e.g. signed int, whereas min/max may be size_int).
+	       Try also to avoid overflow.  */
+	    minidx = build_int_cst (TYPE_DOMAIN (decl_type),
+				    tree_to_shwi (minidx));
+	    maxidx = build_int_cst (TYPE_DOMAIN (decl_type),
+				    tree_to_shwi (maxidx));
+	    HOST_WIDE_INT min = tree_to_shwi (minidx);
+	    unsigned HOST_WIDE_INT lenlessone = tree_to_shwi (maxidx) - min;
 	    unsigned HOST_WIDE_INT idx = 0;
 	    do
 	      {
-		tree nref = build4 (ARRAY_REF, elemtype, ref, size_int (idx),
+		tree nref = build4 (ARRAY_REF, elemtype,
+				    ref, size_int (idx + min),
 				    NULL_TREE, NULL_TREE);
 		int el_off = offset + idx * el_size;
 		scalarize_elem (base, el_off, el_size, nref, elemtype);
 	      }
-	    while (++idx <= lenp1);
+	    while (++idx <= lenlessone);
 	  }
       }
       break;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 6/6] Make SRA replace constant-pool loads
  2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize constant-pool loads Alan Lawrence
                   ` (4 preceding siblings ...)
  2015-10-29 19:22 ` [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices Alan Lawrence
@ 2015-10-29 19:32 ` Alan Lawrence
  2015-10-31 23:48   ` Bernhard Reutner-Fischer
  2015-11-03 14:01   ` Richard Biener
  5 siblings, 2 replies; 28+ messages in thread
From: Alan Lawrence @ 2015-10-29 19:32 UTC (permalink / raw)
  To: gcc-patches

This has changed quite a bit since the previous revision
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
and specifically Ada on ARM.

I didn't find a good alternative to scanning for constant-pool accesses "as we
go" through the function, and although I didn't find any of these accesses
being disqualified after we'd found them in C, some awkward constant-pool
expressions in Ada forced me to disqualify them for a new reason (inability to
constant-propagate). Hence, I introduced a new bitmap of
disqualified_constants, rather than an additional pass through the function.

Next, the approach of using a replacement_expr (with tho constant value) in
place of the old replacement_decl, also failed (the decl could be used in an
expression where substituting in the expr produced ill-formed gimple). Hence,
constant-pool entries now use replacement_decls just like everything else, for
which I generate initializers in initialize_constant_pool_replacements.

However, I was able to avoid generating the replacement constants early, and to
do so late in initialize_constant_pool_replacements; the main trick here was to
use fold_array_ctor_reference, kindly pointed out by Richie :), to fold the
MEM_REFs built in analyze_access_subtree.

Finally, I found completely_scalarize was still failing to fold all the
constant expressions, because Ada was putting VIEW_CONVERT_EXPRs in the
constant pool, and fold-const.c did not deal with ARRAY_REFs of these. However,
requiring completely_scalarize to be able to fold everything, seemed fragile,
instead I decoupled from fold-const by allowing to fail.

With these changes, ssa-dom-cse-2.c is fixed on all platforms with appropriate
--param.

There are still a few remaining test failures, on AArch64:
gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
(also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os).
...which I'm working on. I also tested with the hack at https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html . This revealed one new failure in advsimd-intrinsics/vldX.c on AArch64 (fixed by https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02777.html), and fixed a bunch of guality failures on ARM:
gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
(also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os)

--Alan

gcc/ChangeLog:

	* tree-sra.c (disqualified_constants): New.
	(sra_initialize): Initialize it.
	(sra_deinitialize): Deallocate it.
	(disqualify_candidate): Set bit in disqualified_constants.
	(subst_constant_pool_initial): New.
	(create_access): Scan for constant-pool entries as we go.
	(scalarizable_type_p): Disallow types containing placeholders.
	(completely_scalarize): Return bool to allow failure.
	(scalarize_elem): Likewise; check we can generate constant replacements.
	(maybe_add_sra_candidate): Allow constant-pool entries.
	(analyze_access_subtree): Checking-assert that we can fold any refs
	built for constant-pool entries.
	(analyze_all_variable_accesses): Deal with completely_scalarize failing.
	(initialize_constant_pool_replacements): New.
	(sra_modify_function_body): Call previous.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add sra-max-scalarization-size param,
	remove xfail.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |   7 +-
 gcc/tree-sra.c                                | 221 ++++++++++++++++++++++++--
 2 files changed, 208 insertions(+), 20 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 9eccdc9..b13d583 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
 
 int
 foo ()
@@ -17,7 +17,4 @@ foo ()
 /* After late unrolling the above loop completely DOM should be
    able to optimize this to return 28.  */
 
-/* See PR63679 and PR64159, if the target forces the initializer to memory then
-   DOM is not able to perform this optimization.  */
-
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 358db79..1920265 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -90,6 +90,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimplify-me.h"
 #include "gimple-walk.h"
+#include "gimple-fold.h"
 #include "tree-cfg.h"
 #include "flags.h"
 #include "insn-config.h"
@@ -336,6 +337,10 @@ candidate (unsigned uid)
    those which cannot be (because they are and need be used as a whole).  */
 static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
 
+/* Bitmap of candidates in the constant pool, which cannot be scalarized
+   because this would produce non-constant expressions (e.g. Ada).  */
+static bitmap disqualified_constants;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -658,6 +663,7 @@ sra_initialize (void)
     (vec_safe_length (cfun->local_decls) / 2);
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
+  disqualified_constants = BITMAP_ALLOC (NULL);
   gcc_obstack_init (&name_obstack);
   base_access_vec = new hash_map<tree, auto_vec<access_p> >;
   memset (&sra_stats, 0, sizeof (sra_stats));
@@ -676,6 +682,7 @@ sra_deinitialize (void)
   candidates = NULL;
   BITMAP_FREE (should_scalarize_away_bitmap);
   BITMAP_FREE (cannot_scalarize_away_bitmap);
+  BITMAP_FREE (disqualified_constants);
   access_pool.release ();
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);
@@ -690,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
 {
   if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
     candidates->remove_elt_with_hash (decl, DECL_UID (decl));
+  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
+    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -841,8 +850,76 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
   return access;
 }
 
+/* Given EXPR a chain of COMPONENT_REFs and/or ARRAY_REFs built around a
+   constant-pool declaration VAR, builds a tree like EXPR but with VAR replaced
+   by its DECL_INITIAL, and tries to fold it to a constant.  If folding succeeds
+   then return the folded tree, else NULL_TREE.  */
+
+static tree
+subst_constant_pool_initial (tree expr, tree var)
+{
+  if (TREE_CODE (expr) == VAR_DECL)
+    {
+      gcc_assert (DECL_IN_CONSTANT_POOL (expr));
+      gcc_assert (expr == var);
+      return DECL_INITIAL (expr);
+    }
+  if (TREE_CODE (expr) == COMPONENT_REF)
+    {
+      tree field = TREE_OPERAND (expr, 1);
+      gcc_assert (TREE_CODE (field) == FIELD_DECL);
+      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+      tree record = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
+      if (!record)
+	return NULL_TREE;
+      tree ref = build3 (COMPONENT_REF, TREE_TYPE (expr),
+			 record, field, NULL_TREE);
+      tree res = fold (ref);
+      if (res != ref)
+	return res;
+    }
+  else if (TREE_CODE (expr) == ARRAY_REF)
+    {
+      tree array = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
+      if (!array)
+	return NULL_TREE;
+      tree ref = build4 (ARRAY_REF, TREE_TYPE (expr),
+			 array,
+			 TREE_OPERAND (expr, 1),
+			 TREE_OPERAND (expr, 2),
+			 TREE_OPERAND (expr, 3));
+      /* Ada (at least) builds ARRAY_REFs around strings.  */
+      if (tree folded = fold_read_from_constant_string (ref))
+	return folded;
+      tree res = fold (ref);
+      if (res != ref)
+	return res;
+    }
+  else if (TREE_CODE (expr) == MEM_REF)
+    {
+      gcc_assert (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR);
+      tree type = TREE_TYPE (expr);
+      tree sz = TYPE_SIZE (TREE_TYPE (expr));
+      tree val = TREE_OPERAND (TREE_OPERAND (expr, 0), 0);
+      val = subst_constant_pool_initial (val, var);
+      if (TREE_CODE (val) != CONSTRUCTOR
+	  || !tree_fits_uhwi_p (TREE_OPERAND (expr, 1))
+	  || !tree_fits_uhwi_p (sz))
+	return NULL_TREE;
+      unsigned HOST_WIDE_INT offset =
+	  tree_to_uhwi (TREE_OPERAND (expr, 1)) * BITS_PER_UNIT;
+      return fold_ctor_reference (type, val, offset, tree_to_uhwi (sz), var);
+    }
+
+  /* Unknown expression, or could not constant-fold.  */
+  return NULL_TREE;
+}
+
+static bool maybe_add_sra_candidate (tree var);
+
 /* Create and insert access for EXPR. Return created access, or NULL if it is
-   not possible.  */
+   not possible.  Also scan for uses of constant pool as we go along and add
+   to candidates.  */
 
 static struct access *
 create_access (tree expr, gimple *stmt, bool write)
@@ -865,6 +942,32 @@ create_access (tree expr, gimple *stmt, bool write)
   else
     ptr = false;
 
+  /* For constant-pool entries, check we can substitute the constant value.  */
+  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
+      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
+    {
+      gcc_assert (!write);
+      if (bitmap_bit_p (disqualified_constants, DECL_UID (base)))
+	return NULL;
+      tree repl = subst_constant_pool_initial (expr, base);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+      {
+	fprintf (dump_file, "Trying to scalarize constant-pool ");
+	print_generic_expr (dump_file, base, 0);
+	fprintf (dump_file, "\n In expression: ");
+	print_generic_expr (dump_file, expr, 0);
+	fprintf (dump_file, "\n Constant replacement: ");
+	print_generic_expr (dump_file, repl, 0);
+	fprintf (dump_file, "\n");
+      }
+      /* If we can't resolve the expression to a constant, we risk creating
+	  a malformed expression (e.g. Ada testsuite, ACATS chapter C3).  */
+      if (!repl || !TREE_CONSTANT (repl))
+	disqualify_candidate (base, "Cannot constant-propagate out of pool.");
+      else
+	maybe_add_sra_candidate (base);
+    }
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
     return NULL;
 
@@ -923,6 +1026,8 @@ static bool
 scalarizable_type_p (tree type)
 {
   gcc_assert (!is_gimple_reg_type (type));
+  if (type_contains_placeholder_p (type))
+    return false;
 
   switch (TREE_CODE (type))
   {
@@ -970,15 +1075,19 @@ scalarizable_type_p (tree type)
   }
 }
 
-static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
+static bool scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
 
 /* Create total_scalarization accesses for all scalar fields of a member
    of type DECL_TYPE conforming to scalarizable_type_p.  BASE
    must be the top-most VAR_DECL representing the variable; within that,
    OFFSET locates the member and REF must be the memory reference expression for
-   the member.  */
+   the member.
 
-static void
+   Return TRUE if successful; may fail in the case of constant-pool entry BASE
+   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
+   COMPONENT_REF expressions.  */
+
+static bool
 completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 {
   switch (TREE_CODE (decl_type))
@@ -991,10 +1100,11 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 	    tree ft = TREE_TYPE (fld);
 	    tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE);
 
-	    scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), nref,
-			    ft);
+	    if (!scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)),
+				 nref, ft))
+	      return false;
 	  }
-      break;
+      return true;
     case ARRAY_TYPE:
       {
 	tree elemtype = TREE_TYPE (decl_type);
@@ -1026,12 +1136,13 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 				    ref, size_int (idx + min),
 				    NULL_TREE, NULL_TREE);
 		int el_off = offset + idx * el_size;
-		scalarize_elem (base, el_off, el_size, nref, elemtype);
+		if (!scalarize_elem (base, el_off, el_size, nref, elemtype))
+		  return false;
 	      }
 	    while (++idx <= lenlessone);
 	  }
       }
-      break;
+      return true;
     default:
       gcc_unreachable ();
     }
@@ -1040,22 +1151,32 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 /* Create total_scalarization accesses for a member of type TYPE, which must
    satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be the
    top-most VAR_DECL representing the variable; within that, POS and SIZE locate
-   the member and REF must be the reference expression for it.  */
+   the member and REF must be the reference expression for it.
 
-static void
+   Return TRUE if successful; may fail in the case of constant-pool entry BASE
+   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
+   COMPONENT_REF expressions.  */
+
+static bool
 scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
 		 tree ref, tree type)
 {
   if (is_gimple_reg_type (type))
   {
+    if (TREE_CODE (base) == VAR_DECL
+	&& DECL_IN_CONSTANT_POOL (base)
+	&& subst_constant_pool_initial (ref, base) == NULL_TREE)
+      return false;
     struct access *access = create_access_1 (base, pos, size);
     access->expr = ref;
     access->type = type;
     access->grp_total_scalarization = 1;
     /* Accesses for intraprocedural SRA can have their stmt NULL.  */
+
+    return true;
   }
-  else
-    completely_scalarize (base, type, pos, ref);
+
+  return completely_scalarize (base, type, pos, ref);
 }
 
 /* Create a total_scalarization access for VAR as a whole.  VAR must be of a
@@ -1843,7 +1964,12 @@ maybe_add_sra_candidate (tree var)
       reject (var, "not aggregate");
       return false;
     }
-  if (needs_to_live_in_memory (var))
+  /* Allow constant-pool entries (that "need to live in memory")
+     unless we are doing IPA SRA.  */
+  if (needs_to_live_in_memory (var)
+      && (sra_mode == SRA_MODE_EARLY_IPA
+	  || TREE_CODE (var) != VAR_DECL
+	  || !DECL_IN_CONSTANT_POOL (var)))
     {
       reject (var, "needs to live in memory");
       return false;
@@ -2343,6 +2469,10 @@ analyze_access_subtree (struct access *root, struct access *parent,
 		       (unsigned) root->offset, (unsigned) root->size);
 	      fprintf (dump_file, " to an integer.\n");
 	    }
+	  gcc_checking_assert (TREE_CODE (root->base) != VAR_DECL
+			       || !DECL_IN_CONSTANT_POOL (root->base)
+			       || subst_constant_pool_initial (root->expr,
+							       root->base));
 	}
 
       root->grp_to_be_replaced = 1;
@@ -2604,8 +2734,17 @@ analyze_all_variable_accesses (void)
 	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
 		<= max_scalarization_size)
 	      {
+		vec<access_p> &accesses = base_access_vec->get_or_insert (var);
+		unsigned orig_count = accesses.length ();
+		if (!completely_scalarize (var, TREE_TYPE (var), 0, var))
+		  {
+		    /* Failed.  Discard any accesses created during attempt.  */
+		    for (unsigned i = orig_count; i < accesses.length (); i++)
+		      access_pool.remove (accesses[i]);
+		    accesses.truncate (orig_count);
+		    continue;
+		  }
 		create_total_scalarization_access (var);
-		completely_scalarize (var, TREE_TYPE (var), 0, var);
 		if (dump_file && (dump_flags & TDF_DETAILS))
 		  {
 		    fprintf (dump_file, "Will attempt to totally scalarize ");
@@ -3480,6 +3619,56 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
     }
 }
 
+static void
+initialize_constant_pool_replacements (void)
+{
+  gimple_seq seq = NULL;
+  gimple_stmt_iterator gsi = gsi_start (seq);
+  bitmap_iterator bi;
+  unsigned i;
+
+  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
+    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
+	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
+      {
+	tree var = candidate (i);
+	if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
+	  continue;
+	vec<access_p> *access_vec = get_base_access_vector (var);
+	if (!access_vec)
+	  continue;
+	for (unsigned i = 0; i < access_vec->length (); i++)
+	  {
+	    struct access *access = (*access_vec)[i];
+	    tree val = subst_constant_pool_initial (access->expr, access->base);
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Expression ");
+		print_generic_expr (dump_file, access->expr, 0);
+		fprintf (dump_file, " replaced with ");
+		print_generic_expr (dump_file, access->replacement_decl, 0);
+		fprintf (dump_file, " initialized to ");
+		print_generic_expr (dump_file, val, 0);
+		fprintf (dump_file, "\n");
+		if (!access->replacement_decl)
+		  dump_access (dump_file, access, true);
+	      }
+	    if (!access->replacement_decl)
+	      continue;
+	    gcc_assert (val);
+	    gassign *stmt = gimple_build_assign (
+	      get_access_replacement (access), val);
+	    gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	    update_stmt (stmt);
+	  }
+      }
+
+  seq = gsi_seq (gsi);
+  if (seq)
+    gsi_insert_seq_on_edge_immediate (
+      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
+}
+
 /* Traverse the function body and all modifications as decided in
    analyze_all_variable_accesses.  Return true iff the CFG has been
    changed.  */
@@ -3490,6 +3679,8 @@ sra_modify_function_body (void)
   bool cfg_changed = false;
   basic_block bb;
 
+  initialize_constant_pool_replacements ();
+
   FOR_EACH_BB_FN (bb, cfun)
     {
       gimple_stmt_iterator gsi = gsi_start_bb (bb);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6][Trivial] tree-sra.c: A few comment fixes/additions.
  2015-10-29 19:22 ` [PATCH 4/6][Trivial] tree-sra.c: A few comment fixes/additions Alan Lawrence
@ 2015-10-30  5:35   ` Jeff Law
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Law @ 2015-10-30  5:35 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches

On 10/29/2015 01:18 PM, Alan Lawrence wrote:
> gcc/ChangeLog:
>
> 	* tree-sra.c (scalarizable_type_p): Comment variable-length arrays.
> 	(completely_scalarize): Comment zero-length arrays.
> 	(get_access_replacement): Correct comment re. precondition.
Trivially OK :-)

jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF
  2015-10-29 19:20 ` [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF Alan Lawrence
@ 2015-10-30  5:39   ` Jeff Law
  2015-10-30  9:21     ` Richard Biener
  2015-11-03 10:27     ` Alan Lawrence
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff Law @ 2015-10-30  5:39 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches

On 10/29/2015 01:18 PM, Alan Lawrence wrote:
> This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs (with
> pointer type to the array element type).
>
> gcc/ChangeLog:
>
> 	* tree-ssa-dom.c (dom_normalize_single_rhs): New.
> 	(dom_normalize_gimple_stmt): New.
> 	(lookup_avail_expr): Call dom_normalize_gimple_stmt.
Has this been tested?  Do you have tests where it can be shown to make a 
difference independent of the changes to tree-sra.c?

The implementation looks fine, I just want to have some basic tests in 
the testsuite that show the benefit of this normalization.

Similarly for patch #2. Interestingly enough we had code that made that 
kind of transformation during gimplification eons ago.  Presumably it 
was ripped out at some point because of the differences in aliasing 
properties.



Jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF
  2015-10-30  5:39   ` Jeff Law
@ 2015-10-30  9:21     ` Richard Biener
  2015-11-03 10:27     ` Alan Lawrence
  1 sibling, 0 replies; 28+ messages in thread
From: Richard Biener @ 2015-10-30  9:21 UTC (permalink / raw)
  To: Jeff Law; +Cc: Alan Lawrence, GCC Patches

On Fri, Oct 30, 2015 at 6:35 AM, Jeff Law <law@redhat.com> wrote:
> On 10/29/2015 01:18 PM, Alan Lawrence wrote:
>>
>> This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs
>> (with
>> pointer type to the array element type).
>>
>> gcc/ChangeLog:
>>
>>         * tree-ssa-dom.c (dom_normalize_single_rhs): New.
>>         (dom_normalize_gimple_stmt): New.
>>         (lookup_avail_expr): Call dom_normalize_gimple_stmt.
>
> Has this been tested?  Do you have tests where it can be shown to make a
> difference independent of the changes to tree-sra.c?
>
> The implementation looks fine, I just want to have some basic tests in the
> testsuite that show the benefit of this normalization.

Err, I think the implementation is extremely wasteful ...

> Similarly for patch #2. Interestingly enough we had code that made that kind
> of transformation during gimplification eons ago.  Presumably it was ripped
> out at some point because of the differences in aliasing properties.

Please have a look at how SCCVN handles variants of memory references.
You might even want to re-use it's copy_reference_ops_from_ref implementation
(and reference hashing).  Note that SCCVN needs to be able to reconstruct a
tree expression from its representation (for PRE) which DOM needs not, so
DOM might use a simpler form.  The basic idea is to accumulate constant
offset handled_components into a "offset component".  Thus

  a[2].i                     ->  (&a)->offset(8 + offsetof(i))
  MEM_REF[&a, 8].i ->  (&a)->offset(8 + offsetof(i))

for DOM you can probably do the copy_reference_ops_from_ref work
on-the-fly for hashing and comparing.  The main point will be to forgo
with the DOM way of hashing/comparing for memory references.

Richard.

>
>
> Jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] Share code from fold_array_ctor_reference with fold.
  2015-10-29 19:22 ` [PATCH 3/6] Share code from fold_array_ctor_reference with fold Alan Lawrence
@ 2015-10-30  9:25   ` Richard Biener
  2015-11-03  3:20   ` Jeff Law
  1 sibling, 0 replies; 28+ messages in thread
From: Richard Biener @ 2015-10-30  9:25 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Thu, Oct 29, 2015 at 8:18 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This is in response to https://gcc.gnu.org/ml/gcc/2015-10/msg00097.html, where
> Richi points out that CONSTRUCTOR elements are not necessarily ordered.
>
> I wasn't sure of a good naming convention for the new get_ctor_element_at_index,
> other suggestions welcome.

get_array_ctor_element_at_index

(ok it also handles vectors).

Ok with that change.

Richard.

> gcc/ChangeLog:
>
>         * gimple-fold.c (fold_array_ctor_reference): Move searching code to:
>         * fold-const.c (get_ctor_element_at_index): New.
>         (fold): Remove binary-search through CONSTRUCTOR, call previous.
>
>         * fold-const.h (get_ctor_element_at_index): New.
> ---
>  gcc/fold-const.c  | 93 ++++++++++++++++++++++++++++++++++++++++---------------
>  gcc/fold-const.h  |  1 +
>  gcc/gimple-fold.c | 47 ++--------------------------
>  3 files changed, 72 insertions(+), 69 deletions(-)
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index de45a2c..5d27b91 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -12018,6 +12018,72 @@ fold_ternary_loc (location_t loc, enum tree_code code, tree type,
>      } /* switch (code) */
>  }
>
> +/* Gets the element ACCESS_INDEX from CTOR, which must be a CONSTRUCTOR.  */
> +
> +tree
> +get_ctor_element_at_index (tree ctor, offset_int access_index)
> +{
> +  tree index_type = NULL_TREE;
> +  offset_int low_bound = 0;
> +
> +  if (TREE_CODE (TREE_TYPE (ctor)) == ARRAY_TYPE)
> +  {
> +    tree domain_type = TYPE_DOMAIN (TREE_TYPE (ctor));
> +    if (domain_type && TYPE_MIN_VALUE (domain_type))
> +    {
> +      /* Static constructors for variably sized objects makes no sense.  */
> +      gcc_assert (TREE_CODE (TYPE_MIN_VALUE (domain_type)) == INTEGER_CST);
> +      index_type = TREE_TYPE (TYPE_MIN_VALUE (domain_type));
> +      low_bound = wi::to_offset (TYPE_MIN_VALUE (domain_type));
> +    }
> +  }
> +
> +  if (index_type)
> +    access_index = wi::ext (access_index, TYPE_PRECISION (index_type),
> +                           TYPE_SIGN (index_type));
> +
> +  offset_int index = low_bound - 1;
> +  if (index_type)
> +    index = wi::ext (index, TYPE_PRECISION (index_type),
> +                    TYPE_SIGN (index_type));
> +
> +  offset_int max_index;
> +  unsigned HOST_WIDE_INT cnt;
> +  tree cfield, cval;
> +
> +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield, cval)
> +  {
> +    /* Array constructor might explicitely set index, or specify range
> +     * or leave index NULL meaning that it is next index after previous
> +     * one.  */
> +    if (cfield)
> +    {
> +      if (TREE_CODE (cfield) == INTEGER_CST)
> +       max_index = index = wi::to_offset (cfield);
> +      else
> +      {
> +       gcc_assert (TREE_CODE (cfield) == RANGE_EXPR);
> +       index = wi::to_offset (TREE_OPERAND (cfield, 0));
> +       max_index = wi::to_offset (TREE_OPERAND (cfield, 1));
> +      }
> +    }
> +    else
> +    {
> +      index += 1;
> +      if (index_type)
> +       index = wi::ext (index, TYPE_PRECISION (index_type),
> +                        TYPE_SIGN (index_type));
> +       max_index = index;
> +    }
> +
> +    /* Do we have match?  */
> +    if (wi::cmpu (access_index, index) >= 0
> +       && wi::cmpu (access_index, max_index) <= 0)
> +      return cval;
> +  }
> +  return NULL_TREE;
> +}
> +
>  /* Perform constant folding and related simplification of EXPR.
>     The related simplifications include x*1 => x, x*0 => 0, etc.,
>     and application of the associative law.
> @@ -12094,31 +12160,8 @@ fold (tree expr)
>             && TREE_CODE (op0) == CONSTRUCTOR
>             && ! type_contains_placeholder_p (TREE_TYPE (op0)))
>           {
> -           vec<constructor_elt, va_gc> *elts = CONSTRUCTOR_ELTS (op0);
> -           unsigned HOST_WIDE_INT end = vec_safe_length (elts);
> -           unsigned HOST_WIDE_INT begin = 0;
> -
> -           /* Find a matching index by means of a binary search.  */
> -           while (begin != end)
> -             {
> -               unsigned HOST_WIDE_INT middle = (begin + end) / 2;
> -               tree index = (*elts)[middle].index;
> -
> -               if (TREE_CODE (index) == INTEGER_CST
> -                   && tree_int_cst_lt (index, op1))
> -                 begin = middle + 1;
> -               else if (TREE_CODE (index) == INTEGER_CST
> -                        && tree_int_cst_lt (op1, index))
> -                 end = middle;
> -               else if (TREE_CODE (index) == RANGE_EXPR
> -                        && tree_int_cst_lt (TREE_OPERAND (index, 1), op1))
> -                 begin = middle + 1;
> -               else if (TREE_CODE (index) == RANGE_EXPR
> -                        && tree_int_cst_lt (op1, TREE_OPERAND (index, 0)))
> -                 end = middle;
> -               else
> -                 return (*elts)[middle].value;
> -             }
> +           if (tree val = get_ctor_element_at_index (op0, wi::to_offset (op1)))
> +             return val;
>           }
>
>         return t;
> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> index ee74dc8..1fa6ee0 100644
> --- a/gcc/fold-const.h
> +++ b/gcc/fold-const.h
> @@ -73,6 +73,7 @@ extern tree fold_build_call_array_loc (location_t, tree, tree, int, tree *);
>  #define fold_build_call_array_initializer(T1,T2,N,T4)\
>     fold_build_call_array_initializer_loc (UNKNOWN_LOCATION, T1, T2, N, T4)
>  extern tree fold_build_call_array_initializer_loc (location_t, tree, tree, int, tree *);
> +extern tree get_ctor_element_at_index (tree, offset_int);
>  extern bool fold_convertible_p (const_tree, const_tree);
>  #define fold_convert(T1,T2)\
>     fold_convert_loc (UNKNOWN_LOCATION, T1, T2)
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 85ff018..823d952 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -5308,13 +5308,10 @@ fold_array_ctor_reference (tree type, tree ctor,
>                            unsigned HOST_WIDE_INT size,
>                            tree from_decl)
>  {
> -  unsigned HOST_WIDE_INT cnt;
> -  tree cfield, cval;
>    offset_int low_bound;
>    offset_int elt_size;
> -  offset_int index, max_index;
>    offset_int access_index;
> -  tree domain_type = NULL_TREE, index_type = NULL_TREE;
> +  tree domain_type = NULL_TREE;
>    HOST_WIDE_INT inner_offset;
>
>    /* Compute low bound and elt size.  */
> @@ -5324,7 +5321,6 @@ fold_array_ctor_reference (tree type, tree ctor,
>      {
>        /* Static constructors for variably sized objects makes no sense.  */
>        gcc_assert (TREE_CODE (TYPE_MIN_VALUE (domain_type)) == INTEGER_CST);
> -      index_type = TREE_TYPE (TYPE_MIN_VALUE (domain_type));
>        low_bound = wi::to_offset (TYPE_MIN_VALUE (domain_type));
>      }
>    else
> @@ -5346,9 +5342,6 @@ fold_array_ctor_reference (tree type, tree ctor,
>    access_index = wi::udiv_trunc (offset_int (offset / BITS_PER_UNIT),
>                                  elt_size);
>    access_index += low_bound;
> -  if (index_type)
> -    access_index = wi::ext (access_index, TYPE_PRECISION (index_type),
> -                           TYPE_SIGN (index_type));
>
>    /* And offset within the access.  */
>    inner_offset = offset % (elt_size.to_uhwi () * BITS_PER_UNIT);
> @@ -5357,43 +5350,9 @@ fold_array_ctor_reference (tree type, tree ctor,
>       care to fold accesses spanning multiple array indexes.  */
>    if (inner_offset + size > elt_size.to_uhwi () * BITS_PER_UNIT)
>      return NULL_TREE;
> +  if (tree val = get_ctor_element_at_index (ctor, access_index))
> +    return fold_ctor_reference (type, val, inner_offset, size, from_decl);
>
> -  index = low_bound - 1;
> -  if (index_type)
> -    index = wi::ext (index, TYPE_PRECISION (index_type),
> -                    TYPE_SIGN (index_type));
> -
> -  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield, cval)
> -    {
> -      /* Array constructor might explicitely set index, or specify range
> -        or leave index NULL meaning that it is next index after previous
> -        one.  */
> -      if (cfield)
> -       {
> -         if (TREE_CODE (cfield) == INTEGER_CST)
> -           max_index = index = wi::to_offset (cfield);
> -         else
> -           {
> -             gcc_assert (TREE_CODE (cfield) == RANGE_EXPR);
> -             index = wi::to_offset (TREE_OPERAND (cfield, 0));
> -             max_index = wi::to_offset (TREE_OPERAND (cfield, 1));
> -           }
> -       }
> -      else
> -       {
> -         index += 1;
> -         if (index_type)
> -           index = wi::ext (index, TYPE_PRECISION (index_type),
> -                            TYPE_SIGN (index_type));
> -         max_index = index;
> -       }
> -
> -      /* Do we have match?  */
> -      if (wi::cmpu (access_index, index) >= 0
> -         && wi::cmpu (access_index, max_index) <= 0)
> -       return fold_ctor_reference (type, cval, inner_offset, size,
> -                                   from_decl);
> -    }
>    /* When memory is not explicitely mentioned in constructor,
>       it is 0 (or out of range).  */
>    return build_zero_cst (type);
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices
  2015-10-29 19:22 ` [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices Alan Lawrence
@ 2015-10-30 10:54   ` Richard Biener
  2015-10-30 10:55     ` Eric Botcazou
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Biener @ 2015-10-30 10:54 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Thu, Oct 29, 2015 at 8:18 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> The code I added to completely_scalarize for arrays isn't right in some cases
> of negative array indices (e.g. arrays with indices from -1 to 1 in the Ada
> testsuite). On ARM, this prevents a failure bootstrapping Ada with the next
> patch, as well as a few ACATS tests (e.g. c64106a).
>
> Some discussion here: https://gcc.gnu.org/ml/gcc/2015-10/msg00096.html .
>
> gcc/ChangeLog:
>
>         * tree-sra.c (completely_scalarize): Deal properly with negative array
>         indices.
> ---
>  gcc/tree-sra.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index e15df1f..358db79 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1010,18 +1010,25 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>         if (maxidx)
>           {
>             gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> -           /* MINIDX and MAXIDX are inclusive.  Try to avoid overflow.  */
> -           unsigned HOST_WIDE_INT lenp1 = tree_to_shwi (maxidx)
> -                                       - tree_to_shwi (minidx);
> +           /* MINIDX and MAXIDX are inclusive, and must be interpreted in the
> +              TYPE_DOMAIN (e.g. signed int, whereas min/max may be size_int).
> +              Try also to avoid overflow.  */
> +           minidx = build_int_cst (TYPE_DOMAIN (decl_type),
> +                                   tree_to_shwi (minidx));
> +           maxidx = build_int_cst (TYPE_DOMAIN (decl_type),
> +                                   tree_to_shwi (maxidx));

I think you want to use wide-ints here and

   wide_int idx = wi::from (minidx, TYPE_PRECISION (TYPE_DOMAIN
(...)), TYPE_SIGN (TYPE_DOMAIN (..)));
   wide_int maxidx = ...

you can then simply iterate minidx with ++ and do the final compare
against maxidx
with while (++idx <= maxidx).  For the array ref index we want to use
TYPE_DOMAIN
as type as well, not size_int.  Thus wide_int_to_tree (TYPE_DOMAIN (...)..idx).

RIchard.

> +           HOST_WIDE_INT min = tree_to_shwi (minidx);
> +           unsigned HOST_WIDE_INT lenlessone = tree_to_shwi (maxidx) - min;
>             unsigned HOST_WIDE_INT idx = 0;
>             do
>               {
> -               tree nref = build4 (ARRAY_REF, elemtype, ref, size_int (idx),
> +               tree nref = build4 (ARRAY_REF, elemtype,
> +                                   ref, size_int (idx + min),
>                                     NULL_TREE, NULL_TREE);
>                 int el_off = offset + idx * el_size;
>                 scalarize_elem (base, el_off, el_size, nref, elemtype);
>               }
> -           while (++idx <= lenp1);
> +           while (++idx <= lenlessone);
>           }
>        }
>        break;
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices
  2015-10-30 10:54   ` Richard Biener
@ 2015-10-30 10:55     ` Eric Botcazou
  2015-11-05 13:22       ` Alan Lawrence
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Botcazou @ 2015-10-30 10:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Alan Lawrence

> I think you want to use wide-ints here and
> 
>    wide_int idx = wi::from (minidx, TYPE_PRECISION (TYPE_DOMAIN
> (...)), TYPE_SIGN (TYPE_DOMAIN (..)));
>    wide_int maxidx = ...
> 
> you can then simply iterate minidx with ++ and do the final compare
> against maxidx
> with while (++idx <= maxidx).  For the array ref index we want to use
> TYPE_DOMAIN
> as type as well, not size_int.  Thus wide_int_to_tree (TYPE_DOMAIN
> (...)..idx).

Yes, you generally cannot use HOST_WIDE_INT to avoid overflow because this 
will break for 64-bit HOST_WIDE_INT and 32-bit sizetype in corner cases.

But using offset_int should be OK, see for example get_ref_base_and_extent.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-10-29 19:32 ` [PATCH 6/6] Make SRA replace constant-pool loads Alan Lawrence
@ 2015-10-31 23:48   ` Bernhard Reutner-Fischer
  2015-11-03 14:01   ` Richard Biener
  1 sibling, 0 replies; 28+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-10-31 23:48 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches

On October 29, 2015 8:18:22 PM GMT+01:00, Alan Lawrence <alan.lawrence@arm.com> wrote:

>+
>+static tree
>+subst_constant_pool_initial (tree expr, tree var)
>+{
>+  if (TREE_CODE (expr) == VAR_DECL)

Just a nit, but i thought we had VAR_DECL_P or VAR_P for the TREE_CODE (NODE) == VAR_DECL predicate?

Thanks,

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] Share code from fold_array_ctor_reference with fold.
  2015-10-29 19:22 ` [PATCH 3/6] Share code from fold_array_ctor_reference with fold Alan Lawrence
  2015-10-30  9:25   ` Richard Biener
@ 2015-11-03  3:20   ` Jeff Law
  2015-11-04 14:35     ` Alan Lawrence
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff Law @ 2015-11-03  3:20 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches

On 10/29/2015 01:18 PM, Alan Lawrence wrote:
> This is in response to https://gcc.gnu.org/ml/gcc/2015-10/msg00097.html, where
> Richi points out that CONSTRUCTOR elements are not necessarily ordered.
>
> I wasn't sure of a good naming convention for the new get_ctor_element_at_index,
> other suggestions welcome.
>
> gcc/ChangeLog:
>
> 	* gimple-fold.c (fold_array_ctor_reference): Move searching code to:
> 	* fold-const.c (get_ctor_element_at_index): New.
> 	(fold): Remove binary-search through CONSTRUCTOR, call previous.
>
> 	* fold-const.h (get_ctor_element_at_index): New.
Code is fine.  Some formatting nits though.


> ---
>   gcc/fold-const.c  | 93 ++++++++++++++++++++++++++++++++++++++++---------------
>   gcc/fold-const.h  |  1 +
>   gcc/gimple-fold.c | 47 ++--------------------------
>   3 files changed, 72 insertions(+), 69 deletions(-)
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index de45a2c..5d27b91 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> +
> +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield, cval)
> +  {
> +    /* Array constructor might explicitely set index, or specify range
> +     * or leave index NULL meaning that it is next index after previous
> +     * one.  */
s/explicitely/explicitly/  And remove the '*' from the 2nd and 3rd lines 
of the comment.

It looks like get_ctor_element_at_index has numerous formatting 
problems.  In particular you didn't indent the braces across the board 
properly.  Also check for tabs vs spaces issues please.

Can you please fix the formatting and comments and repost for a final check?

Thanks,
Jeff



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF
  2015-10-30  5:39   ` Jeff Law
  2015-10-30  9:21     ` Richard Biener
@ 2015-11-03 10:27     ` Alan Lawrence
  2015-11-03 11:13       ` Alan Lawrence
  1 sibling, 1 reply; 28+ messages in thread
From: Alan Lawrence @ 2015-11-03 10:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: law

On 30/10/15 05:35, Jeff Law wrote:
> On 10/29/2015 01:18 PM, Alan Lawrence wrote:
>> This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs (with
>> pointer type to the array element type).
>>
>> gcc/ChangeLog:
>>
>>     * tree-ssa-dom.c (dom_normalize_single_rhs): New.
>>     (dom_normalize_gimple_stmt): New.
>>     (lookup_avail_expr): Call dom_normalize_gimple_stmt.
> Has this been tested?  Do you have tests where it can be shown to make a
> difference independent of the changes to tree-sra.c?
>
> The implementation looks fine, I just want to have some basic tests in the
> testsuite that show the benefit of this normalization.

I'll look at the implementation and Richard's comments soon, but as to tests -
well I had tried before and not had much luck but OK you make me try harder ;).

I'll put these in with the appropriate patches, but ssa-dom-cse-{5,6}.c test
the ARRAY_REF -> MEM_REF part (one of these has to disable SRA from optimizing
the whole test away, the other...ends up waiting until dom2 for the whole loop
to have been unrolled, so I'd not be surprised if this proved a bit fragile,
as in not spotting when some other phase starts doing the optimization instead).
ssa-dom-cse-7.c tests the normalization-of-MEM_REFs part; but AFAICT, the
*only* place making exprs like the problematic MEM[(int[8] *)&a ...], is SRA.
That is, ssa-dom-cse-7.c passes (and the patch series solves PR/63679) if
instead of my patch 2 (normalization of MEM_REFs) we have this:

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 4327990..2889a96 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1697,7 +1697,7 @@ build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset,
     }
   else
     {
-      off = build_int_cst (reference_alias_ptr_type (base),
+      off = build_int_cst (build_pointer_type (exp_type),
                           base_offset + offset / BITS_PER_UNIT);
       base = build_fold_addr_expr (unshare_expr (base));
     }

...I'll test that fully but I have to wonder what the right path is here!

Cheers,
Alan
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c | 17 +++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c | 16 ++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c | 18 ++++++++++++++++++
 3 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
new file mode 100644
index 0000000..cfbb85f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
@@ -0,0 +1,17 @@
+/* Test normalization of ARRAY_REF expressions to MEM_REFs in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-dom2" } */
+
+#define N 8
+
+int
+main (int argc, char **argv)
+{
+  int a[N];
+  for (int i = 0; i < N; i++)
+    a[i] = 2*i + 1;
+  int *p = &a[0];
+  return *(++p);
+}
+
+/* { dg-final { scan-tree-dump-times "return 3;" 1 "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
new file mode 100644
index 0000000..c387fa3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
@@ -0,0 +1,16 @@
+/* Test normalization of ARRAY_REF expressions to MEM_REFs in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-sra -fno-tree-fre -fdump-tree-dom1" } */
+
+int
+main (int argc, char **argv)
+{
+  union {
+    int a[8];
+    int b[2];
+  } u = { .a = { 1, 42, 3, 4, 5, 6, 7, 8 } };
+
+  return u.b[1];
+}
+
+/* { dg-final { scan-assembler-times "return 42;" 1 "dom1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c
new file mode 100644
index 0000000..3f4ca17
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c
@@ -0,0 +1,18 @@
+/* Test normalization of MEM_REF expressions in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" }
+
+typedef struct {
+  int a[8];
+} foo;
+
+foo f;
+
+int test ()
+{
+  foo g = { { 1, 2, 3, 4, 5, 6, 7, 8 } };
+  f=g;
+  return f.a[2];
+}
+
+/* { dg-final { scan-tree-dump-times "return 3;" 1 "optimized" } } */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF
  2015-11-03 10:27     ` Alan Lawrence
@ 2015-11-03 11:13       ` Alan Lawrence
  2015-11-03 11:49         ` Richard Biener
  0 siblings, 1 reply; 28+ messages in thread
From: Alan Lawrence @ 2015-11-03 11:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: law

On 3 November 2015 at 10:27, Alan Lawrence <alan.lawrence@arm.com> wrote:
> That is, ssa-dom-cse-7.c passes (and the patch series solves PR/63679) if
> instead of my patch 2 (normalization of MEM_REFs) we have this:
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 4327990..2889a96 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1697,7 +1697,7 @@ build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset,
>      }
>    else
>      {
> -      off = build_int_cst (reference_alias_ptr_type (base),
> +      off = build_int_cst (build_pointer_type (exp_type),
>                            base_offset + offset / BITS_PER_UNIT);
>        base = build_fold_addr_expr (unshare_expr (base));
>      }
>
> ...I'll test that fully but I have to wonder what the right path is here!

So with also changing the other reference_alias_ptr_type in the first
case of build_ref_for_offset, it breaks Ada ACATS (on x86):

c52101a "CHECK THAT ARRAY SUBTYPE CONVERSION IS APPLIED AFTER AN ARRAY
VALUE IS DETERMINED"
cc70003
cxac004 (stream access, stream functions)

....I'll not dig any further unless you think that change to SRA is
the right avenue to investigate!

Cheers, Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF
  2015-11-03 11:13       ` Alan Lawrence
@ 2015-11-03 11:49         ` Richard Biener
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Biener @ 2015-11-03 11:49 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, Jeff Law

On Tue, Nov 3, 2015 at 12:13 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 3 November 2015 at 10:27, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> That is, ssa-dom-cse-7.c passes (and the patch series solves PR/63679) if
>> instead of my patch 2 (normalization of MEM_REFs) we have this:
>>
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index 4327990..2889a96 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -1697,7 +1697,7 @@ build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset,
>>      }
>>    else
>>      {
>> -      off = build_int_cst (reference_alias_ptr_type (base),
>> +      off = build_int_cst (build_pointer_type (exp_type),
>>                            base_offset + offset / BITS_PER_UNIT);
>>        base = build_fold_addr_expr (unshare_expr (base));
>>      }
>>
>> ...I'll test that fully but I have to wonder what the right path is here!
>
> So with also changing the other reference_alias_ptr_type in the first
> case of build_ref_for_offset, it breaks Ada ACATS (on x86):
>
> c52101a "CHECK THAT ARRAY SUBTYPE CONVERSION IS APPLIED AFTER AN ARRAY
> VALUE IS DETERMINED"
> cc70003
> cxac004 (stream access, stream functions)
>
> ....I'll not dig any further unless you think that change to SRA is
> the right avenue to investigate!

Nope, that change looks wrong to me.

Richard.

> Cheers, Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-10-29 19:32 ` [PATCH 6/6] Make SRA replace constant-pool loads Alan Lawrence
  2015-10-31 23:48   ` Bernhard Reutner-Fischer
@ 2015-11-03 14:01   ` Richard Biener
  2015-11-05 18:55     ` Alan Lawrence
  1 sibling, 1 reply; 28+ messages in thread
From: Richard Biener @ 2015-11-03 14:01 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Thu, Oct 29, 2015 at 8:18 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This has changed quite a bit since the previous revision
> (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
> and specifically Ada on ARM.
>
> I didn't find a good alternative to scanning for constant-pool accesses "as we
> go" through the function, and although I didn't find any of these accesses
> being disqualified after we'd found them in C, some awkward constant-pool
> expressions in Ada forced me to disqualify them for a new reason (inability to
> constant-propagate). Hence, I introduced a new bitmap of
> disqualified_constants, rather than an additional pass through the function.
>
> Next, the approach of using a replacement_expr (with tho constant value) in
> place of the old replacement_decl, also failed (the decl could be used in an
> expression where substituting in the expr produced ill-formed gimple). Hence,
> constant-pool entries now use replacement_decls just like everything else, for
> which I generate initializers in initialize_constant_pool_replacements.
>
> However, I was able to avoid generating the replacement constants early, and to
> do so late in initialize_constant_pool_replacements; the main trick here was to
> use fold_array_ctor_reference, kindly pointed out by Richie :), to fold the
> MEM_REFs built in analyze_access_subtree.
>
> Finally, I found completely_scalarize was still failing to fold all the
> constant expressions, because Ada was putting VIEW_CONVERT_EXPRs in the
> constant pool, and fold-const.c did not deal with ARRAY_REFs of these. However,
> requiring completely_scalarize to be able to fold everything, seemed fragile,
> instead I decoupled from fold-const by allowing to fail.
>
> With these changes, ssa-dom-cse-2.c is fixed on all platforms with appropriate
> --param.

Hum.  I still wonder why we need all this complication ...  I would
expect that if
we simply decided to completely scalarize a constant pool aggregate load then
we can always do that.  SRA should then simply emit the _same_ IL as it does
for other complete scalarization element accesses.  The only difference is
that the result should be simplifiable to omit the element load from
the constant pool.

1) The FRE pass following SRA should do that for you.

2) You should be able to use fold_ctor_reference directly (in place of
all your code
in case offset and size are readily available - don't remember exactly how
complete scalarization "walks" elements).  Alternatively use
fold_const_aggregate_ref.

3) You can simplify the stmt SRA generated by simply calling fold_stmt on it,
that will do a bit more (wasted) work compared to 2) but may be easier.

I wouldn't bother with the case where we for some reason do not simplify
the constant pool load.

That is, I'd like to see the patch greatly simplified to just consider
constant pool
complete scalarization as if it were a regular variable.

Richard.

>
> There are still a few remaining test failures, on AArch64:
> gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
> (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os).
> ...which I'm working on. I also tested with the hack at https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html . This revealed one new failure in advsimd-intrinsics/vldX.c on AArch64 (fixed by https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02777.html), and fixed a bunch of guality failures on ARM:
> gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
> (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os)
>
> --Alan
>
> gcc/ChangeLog:
>
>         * tree-sra.c (disqualified_constants): New.
>         (sra_initialize): Initialize it.
>         (sra_deinitialize): Deallocate it.
>         (disqualify_candidate): Set bit in disqualified_constants.
>         (subst_constant_pool_initial): New.
>         (create_access): Scan for constant-pool entries as we go.
>         (scalarizable_type_p): Disallow types containing placeholders.
>         (completely_scalarize): Return bool to allow failure.
>         (scalarize_elem): Likewise; check we can generate constant replacements.
>         (maybe_add_sra_candidate): Allow constant-pool entries.
>         (analyze_access_subtree): Checking-assert that we can fold any refs
>         built for constant-pool entries.
>         (analyze_all_variable_accesses): Deal with completely_scalarize failing.
>         (initialize_constant_pool_replacements): New.
>         (sra_modify_function_body): Call previous.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add sra-max-scalarization-size param,
>         remove xfail.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |   7 +-
>  gcc/tree-sra.c                                | 221 ++++++++++++++++++++++++--
>  2 files changed, 208 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> index 9eccdc9..b13d583 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
> +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
>
>  int
>  foo ()
> @@ -17,7 +17,4 @@ foo ()
>  /* After late unrolling the above loop completely DOM should be
>     able to optimize this to return 28.  */
>
> -/* See PR63679 and PR64159, if the target forces the initializer to memory then
> -   DOM is not able to perform this optimization.  */
> -
> -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
> +/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 358db79..1920265 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -90,6 +90,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-iterator.h"
>  #include "gimplify-me.h"
>  #include "gimple-walk.h"
> +#include "gimple-fold.h"
>  #include "tree-cfg.h"
>  #include "flags.h"
>  #include "insn-config.h"
> @@ -336,6 +337,10 @@ candidate (unsigned uid)
>     those which cannot be (because they are and need be used as a whole).  */
>  static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
>
> +/* Bitmap of candidates in the constant pool, which cannot be scalarized
> +   because this would produce non-constant expressions (e.g. Ada).  */
> +static bitmap disqualified_constants;
> +
>  /* Obstack for creation of fancy names.  */
>  static struct obstack name_obstack;
>
> @@ -658,6 +663,7 @@ sra_initialize (void)
>      (vec_safe_length (cfun->local_decls) / 2);
>    should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>    cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
> +  disqualified_constants = BITMAP_ALLOC (NULL);
>    gcc_obstack_init (&name_obstack);
>    base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>    memset (&sra_stats, 0, sizeof (sra_stats));
> @@ -676,6 +682,7 @@ sra_deinitialize (void)
>    candidates = NULL;
>    BITMAP_FREE (should_scalarize_away_bitmap);
>    BITMAP_FREE (cannot_scalarize_away_bitmap);
> +  BITMAP_FREE (disqualified_constants);
>    access_pool.release ();
>    assign_link_pool.release ();
>    obstack_free (&name_obstack, NULL);
> @@ -690,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
>  {
>    if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
>      candidates->remove_elt_with_hash (decl, DECL_UID (decl));
> +  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
> +    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -841,8 +850,76 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
>    return access;
>  }
>
> +/* Given EXPR a chain of COMPONENT_REFs and/or ARRAY_REFs built around a
> +   constant-pool declaration VAR, builds a tree like EXPR but with VAR replaced
> +   by its DECL_INITIAL, and tries to fold it to a constant.  If folding succeeds
> +   then return the folded tree, else NULL_TREE.  */
> +
> +static tree
> +subst_constant_pool_initial (tree expr, tree var)
> +{
> +  if (TREE_CODE (expr) == VAR_DECL)
> +    {
> +      gcc_assert (DECL_IN_CONSTANT_POOL (expr));
> +      gcc_assert (expr == var);
> +      return DECL_INITIAL (expr);
> +    }
> +  if (TREE_CODE (expr) == COMPONENT_REF)
> +    {
> +      tree field = TREE_OPERAND (expr, 1);
> +      gcc_assert (TREE_CODE (field) == FIELD_DECL);
> +      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
> +      tree record = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
> +      if (!record)
> +       return NULL_TREE;
> +      tree ref = build3 (COMPONENT_REF, TREE_TYPE (expr),
> +                        record, field, NULL_TREE);
> +      tree res = fold (ref);
> +      if (res != ref)
> +       return res;
> +    }
> +  else if (TREE_CODE (expr) == ARRAY_REF)
> +    {
> +      tree array = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
> +      if (!array)
> +       return NULL_TREE;
> +      tree ref = build4 (ARRAY_REF, TREE_TYPE (expr),
> +                        array,
> +                        TREE_OPERAND (expr, 1),
> +                        TREE_OPERAND (expr, 2),
> +                        TREE_OPERAND (expr, 3));
> +      /* Ada (at least) builds ARRAY_REFs around strings.  */
> +      if (tree folded = fold_read_from_constant_string (ref))
> +       return folded;
> +      tree res = fold (ref);
> +      if (res != ref)
> +       return res;
> +    }
> +  else if (TREE_CODE (expr) == MEM_REF)
> +    {
> +      gcc_assert (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR);
> +      tree type = TREE_TYPE (expr);
> +      tree sz = TYPE_SIZE (TREE_TYPE (expr));
> +      tree val = TREE_OPERAND (TREE_OPERAND (expr, 0), 0);
> +      val = subst_constant_pool_initial (val, var);
> +      if (TREE_CODE (val) != CONSTRUCTOR
> +         || !tree_fits_uhwi_p (TREE_OPERAND (expr, 1))
> +         || !tree_fits_uhwi_p (sz))
> +       return NULL_TREE;
> +      unsigned HOST_WIDE_INT offset =
> +         tree_to_uhwi (TREE_OPERAND (expr, 1)) * BITS_PER_UNIT;
> +      return fold_ctor_reference (type, val, offset, tree_to_uhwi (sz), var);
> +    }
> +
> +  /* Unknown expression, or could not constant-fold.  */
> +  return NULL_TREE;
> +}
> +
> +static bool maybe_add_sra_candidate (tree var);
> +
>  /* Create and insert access for EXPR. Return created access, or NULL if it is
> -   not possible.  */
> +   not possible.  Also scan for uses of constant pool as we go along and add
> +   to candidates.  */
>
>  static struct access *
>  create_access (tree expr, gimple *stmt, bool write)
> @@ -865,6 +942,32 @@ create_access (tree expr, gimple *stmt, bool write)
>    else
>      ptr = false;
>
> +  /* For constant-pool entries, check we can substitute the constant value.  */
> +  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
> +      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
> +    {
> +      gcc_assert (!write);
> +      if (bitmap_bit_p (disqualified_constants, DECL_UID (base)))
> +       return NULL;
> +      tree repl = subst_constant_pool_initial (expr, base);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +      {
> +       fprintf (dump_file, "Trying to scalarize constant-pool ");
> +       print_generic_expr (dump_file, base, 0);
> +       fprintf (dump_file, "\n In expression: ");
> +       print_generic_expr (dump_file, expr, 0);
> +       fprintf (dump_file, "\n Constant replacement: ");
> +       print_generic_expr (dump_file, repl, 0);
> +       fprintf (dump_file, "\n");
> +      }
> +      /* If we can't resolve the expression to a constant, we risk creating
> +         a malformed expression (e.g. Ada testsuite, ACATS chapter C3).  */
> +      if (!repl || !TREE_CONSTANT (repl))
> +       disqualify_candidate (base, "Cannot constant-propagate out of pool.");
> +      else
> +       maybe_add_sra_candidate (base);
> +    }
> +
>    if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>      return NULL;
>
> @@ -923,6 +1026,8 @@ static bool
>  scalarizable_type_p (tree type)
>  {
>    gcc_assert (!is_gimple_reg_type (type));
> +  if (type_contains_placeholder_p (type))
> +    return false;
>
>    switch (TREE_CODE (type))
>    {
> @@ -970,15 +1075,19 @@ scalarizable_type_p (tree type)
>    }
>  }
>
> -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
> +static bool scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
>
>  /* Create total_scalarization accesses for all scalar fields of a member
>     of type DECL_TYPE conforming to scalarizable_type_p.  BASE
>     must be the top-most VAR_DECL representing the variable; within that,
>     OFFSET locates the member and REF must be the memory reference expression for
> -   the member.  */
> +   the member.
>
> -static void
> +   Return TRUE if successful; may fail in the case of constant-pool entry BASE
> +   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
> +   COMPONENT_REF expressions.  */
> +
> +static bool
>  completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>  {
>    switch (TREE_CODE (decl_type))
> @@ -991,10 +1100,11 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>             tree ft = TREE_TYPE (fld);
>             tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE);
>
> -           scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), nref,
> -                           ft);
> +           if (!scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)),
> +                                nref, ft))
> +             return false;
>           }
> -      break;
> +      return true;
>      case ARRAY_TYPE:
>        {
>         tree elemtype = TREE_TYPE (decl_type);
> @@ -1026,12 +1136,13 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>                                     ref, size_int (idx + min),
>                                     NULL_TREE, NULL_TREE);
>                 int el_off = offset + idx * el_size;
> -               scalarize_elem (base, el_off, el_size, nref, elemtype);
> +               if (!scalarize_elem (base, el_off, el_size, nref, elemtype))
> +                 return false;
>               }
>             while (++idx <= lenlessone);
>           }
>        }
> -      break;
> +      return true;
>      default:
>        gcc_unreachable ();
>      }
> @@ -1040,22 +1151,32 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>  /* Create total_scalarization accesses for a member of type TYPE, which must
>     satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be the
>     top-most VAR_DECL representing the variable; within that, POS and SIZE locate
> -   the member and REF must be the reference expression for it.  */
> +   the member and REF must be the reference expression for it.
>
> -static void
> +   Return TRUE if successful; may fail in the case of constant-pool entry BASE
> +   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
> +   COMPONENT_REF expressions.  */
> +
> +static bool
>  scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
>                  tree ref, tree type)
>  {
>    if (is_gimple_reg_type (type))
>    {
> +    if (TREE_CODE (base) == VAR_DECL
> +       && DECL_IN_CONSTANT_POOL (base)
> +       && subst_constant_pool_initial (ref, base) == NULL_TREE)
> +      return false;
>      struct access *access = create_access_1 (base, pos, size);
>      access->expr = ref;
>      access->type = type;
>      access->grp_total_scalarization = 1;
>      /* Accesses for intraprocedural SRA can have their stmt NULL.  */
> +
> +    return true;
>    }
> -  else
> -    completely_scalarize (base, type, pos, ref);
> +
> +  return completely_scalarize (base, type, pos, ref);
>  }
>
>  /* Create a total_scalarization access for VAR as a whole.  VAR must be of a
> @@ -1843,7 +1964,12 @@ maybe_add_sra_candidate (tree var)
>        reject (var, "not aggregate");
>        return false;
>      }
> -  if (needs_to_live_in_memory (var))
> +  /* Allow constant-pool entries (that "need to live in memory")
> +     unless we are doing IPA SRA.  */
> +  if (needs_to_live_in_memory (var)
> +      && (sra_mode == SRA_MODE_EARLY_IPA
> +         || TREE_CODE (var) != VAR_DECL
> +         || !DECL_IN_CONSTANT_POOL (var)))
>      {
>        reject (var, "needs to live in memory");
>        return false;
> @@ -2343,6 +2469,10 @@ analyze_access_subtree (struct access *root, struct access *parent,
>                        (unsigned) root->offset, (unsigned) root->size);
>               fprintf (dump_file, " to an integer.\n");
>             }
> +         gcc_checking_assert (TREE_CODE (root->base) != VAR_DECL
> +                              || !DECL_IN_CONSTANT_POOL (root->base)
> +                              || subst_constant_pool_initial (root->expr,
> +                                                              root->base));
>         }
>
>        root->grp_to_be_replaced = 1;
> @@ -2604,8 +2734,17 @@ analyze_all_variable_accesses (void)
>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>                 <= max_scalarization_size)
>               {
> +               vec<access_p> &accesses = base_access_vec->get_or_insert (var);
> +               unsigned orig_count = accesses.length ();
> +               if (!completely_scalarize (var, TREE_TYPE (var), 0, var))
> +                 {
> +                   /* Failed.  Discard any accesses created during attempt.  */
> +                   for (unsigned i = orig_count; i < accesses.length (); i++)
> +                     access_pool.remove (accesses[i]);
> +                   accesses.truncate (orig_count);
> +                   continue;
> +                 }
>                 create_total_scalarization_access (var);
> -               completely_scalarize (var, TREE_TYPE (var), 0, var);
>                 if (dump_file && (dump_flags & TDF_DETAILS))
>                   {
>                     fprintf (dump_file, "Will attempt to totally scalarize ");
> @@ -3480,6 +3619,56 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>      }
>  }
>
> +static void
> +initialize_constant_pool_replacements (void)
> +{
> +  gimple_seq seq = NULL;
> +  gimple_stmt_iterator gsi = gsi_start (seq);
> +  bitmap_iterator bi;
> +  unsigned i;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
> +    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
> +       && !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
> +      {
> +       tree var = candidate (i);
> +       if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
> +         continue;
> +       vec<access_p> *access_vec = get_base_access_vector (var);
> +       if (!access_vec)
> +         continue;
> +       for (unsigned i = 0; i < access_vec->length (); i++)
> +         {
> +           struct access *access = (*access_vec)[i];
> +           tree val = subst_constant_pool_initial (access->expr, access->base);
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             {
> +               fprintf (dump_file, "Expression ");
> +               print_generic_expr (dump_file, access->expr, 0);
> +               fprintf (dump_file, " replaced with ");
> +               print_generic_expr (dump_file, access->replacement_decl, 0);
> +               fprintf (dump_file, " initialized to ");
> +               print_generic_expr (dump_file, val, 0);
> +               fprintf (dump_file, "\n");
> +               if (!access->replacement_decl)
> +                 dump_access (dump_file, access, true);
> +             }
> +           if (!access->replacement_decl)
> +             continue;
> +           gcc_assert (val);
> +           gassign *stmt = gimple_build_assign (
> +             get_access_replacement (access), val);
> +           gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> +           update_stmt (stmt);
> +         }
> +      }
> +
> +  seq = gsi_seq (gsi);
> +  if (seq)
> +    gsi_insert_seq_on_edge_immediate (
> +      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
> +}
> +
>  /* Traverse the function body and all modifications as decided in
>     analyze_all_variable_accesses.  Return true iff the CFG has been
>     changed.  */
> @@ -3490,6 +3679,8 @@ sra_modify_function_body (void)
>    bool cfg_changed = false;
>    basic_block bb;
>
> +  initialize_constant_pool_replacements ();
> +
>    FOR_EACH_BB_FN (bb, cfun)
>      {
>        gimple_stmt_iterator gsi = gsi_start_bb (bb);
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] Share code from fold_array_ctor_reference with fold.
  2015-11-03  3:20   ` Jeff Law
@ 2015-11-04 14:35     ` Alan Lawrence
  2015-11-06 21:19       ` Jeff Law
  0 siblings, 1 reply; 28+ messages in thread
From: Alan Lawrence @ 2015-11-04 14:35 UTC (permalink / raw)
  To: law; +Cc: gcc-patches

> s/explicitely/explicitly/  And remove the '*' from the 2nd and 3rd lines
> of the comment.
>
> It looks like get_ctor_element_at_index has numerous formatting
> problems.  In particular you didn't indent the braces across the board
> properly.  Also check for tabs vs spaces issues please.

Yes, you are quite right, I'm not quite sure how those crept in. (Well, the
'explicitely' I am sure was a copy-paste error but the others!). Apologies...

I already committed the offending code as r229605, but here's a patch to clean
it up - does it look ok now?

Thanks, Alan
---
 gcc/fold-const.c | 58 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ee9b349..e977b49 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -11855,16 +11855,16 @@ get_array_ctor_element_at_index (tree ctor, offset_int access_index)
   offset_int low_bound = 0;
 
   if (TREE_CODE (TREE_TYPE (ctor)) == ARRAY_TYPE)
-  {
-    tree domain_type = TYPE_DOMAIN (TREE_TYPE (ctor));
-    if (domain_type && TYPE_MIN_VALUE (domain_type))
     {
-      /* Static constructors for variably sized objects makes no sense.  */
-      gcc_assert (TREE_CODE (TYPE_MIN_VALUE (domain_type)) == INTEGER_CST);
-      index_type = TREE_TYPE (TYPE_MIN_VALUE (domain_type));
-      low_bound = wi::to_offset (TYPE_MIN_VALUE (domain_type));
+      tree domain_type = TYPE_DOMAIN (TREE_TYPE (ctor));
+      if (domain_type && TYPE_MIN_VALUE (domain_type))
+	{
+	  /* Static constructors for variably sized objects makes no sense.  */
+	  gcc_assert (TREE_CODE (TYPE_MIN_VALUE (domain_type)) == INTEGER_CST);
+	  index_type = TREE_TYPE (TYPE_MIN_VALUE (domain_type));
+	  low_bound = wi::to_offset (TYPE_MIN_VALUE (domain_type));
+	}
     }
-  }
 
   if (index_type)
     access_index = wi::ext (access_index, TYPE_PRECISION (index_type),
@@ -11880,29 +11880,29 @@ get_array_ctor_element_at_index (tree ctor, offset_int access_index)
   tree cfield, cval;
 
   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield, cval)
-  {
-    /* Array constructor might explicitely set index, or specify range
-     * or leave index NULL meaning that it is next index after previous
-     * one.  */
-    if (cfield)
     {
-      if (TREE_CODE (cfield) == INTEGER_CST)
-	max_index = index = wi::to_offset (cfield);
+      /* Array constructor might explicitly set index, or specify a range,
+	 or leave index NULL meaning that it is next index after previous
+	 one.  */
+      if (cfield)
+	{
+	  if (TREE_CODE (cfield) == INTEGER_CST)
+	    max_index = index = wi::to_offset (cfield);
+	  else
+	    {
+	      gcc_assert (TREE_CODE (cfield) == RANGE_EXPR);
+	      index = wi::to_offset (TREE_OPERAND (cfield, 0));
+	      max_index = wi::to_offset (TREE_OPERAND (cfield, 1));
+	    }
+	}
       else
-      {
-	gcc_assert (TREE_CODE (cfield) == RANGE_EXPR);
-	index = wi::to_offset (TREE_OPERAND (cfield, 0));
-	max_index = wi::to_offset (TREE_OPERAND (cfield, 1));
-      }
-    }
-    else
-    {
-      index += 1;
-      if (index_type)
-	index = wi::ext (index, TYPE_PRECISION (index_type),
-			 TYPE_SIGN (index_type));
-	max_index = index;
-    }
+	{
+	  index += 1;
+	  if (index_type)
+	    index = wi::ext (index, TYPE_PRECISION (index_type),
+			     TYPE_SIGN (index_type));
+	  max_index = index;
+	}
 
     /* Do we have match?  */
     if (wi::cmpu (access_index, index) >= 0
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices
  2015-10-30 10:55     ` Eric Botcazou
@ 2015-11-05 13:22       ` Alan Lawrence
  2015-11-06 12:30         ` Richard Biener
  0 siblings, 1 reply; 28+ messages in thread
From: Alan Lawrence @ 2015-11-05 13:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: ebotcazou, richard.guenther

On 30/10/15 10:54, Eric Botcazou wrote:
> On 30/10/15 10:44, Richard Biener wrote:
>>
>> I think you want to use wide-ints here and
>>
>>     wide_int idx = wi::from (minidx, TYPE_PRECISION (TYPE_DOMAIN
>> (...)), TYPE_SIGN (TYPE_DOMAIN (..)));
>>     wide_int maxidx = ...
>>
>> you can then simply iterate minidx with ++ and do the final compare
>> against maxidx
>> with while (++idx <= maxidx).  For the array ref index we want to use
>> TYPE_DOMAIN
>> as type as well, not size_int.  Thus wide_int_to_tree (TYPE_DOMAIN (...)..idx).
[...]
> But using offset_int should be OK, see for example get_ref_base_and_extent.
>

Here's a patch using offset_int. (Not as easy to construct as wide_int::from,
the sign-extend is what appeared to be done elsewhere that constructs
offset_ints).

Tested by bootstrap+check-{gcc,g++,ada,fortran} with the rest of the patchset
(which causes an array[-1..1] to be completely scalarized, among others), on
x86_64 and ARM.

I don't have a test without all that (such would have to be in Ada, and trigger
SRA of such an array but not involving the constant pool); is it OK without?

gcc/ChangeLog:

	* tree-sra.c (completely_scalarize): Properly handle negative array
	indices using offset_int.
---
 gcc/tree-sra.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index e15df1f..6168a7e 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1010,18 +1010,25 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
 	if (maxidx)
 	  {
 	    gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
-	    /* MINIDX and MAXIDX are inclusive.  Try to avoid overflow.  */
-	    unsigned HOST_WIDE_INT lenp1 = tree_to_shwi (maxidx)
-					- tree_to_shwi (minidx);
-	    unsigned HOST_WIDE_INT idx = 0;
-	    do
+	    tree domain = TYPE_DOMAIN (decl_type);
+	    /* MINIDX and MAXIDX are inclusive, and must be interpreted in
+	       DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
+	    offset_int idx = wi::to_offset (minidx);
+	    offset_int max = wi::to_offset (maxidx);
+	    if (!TYPE_UNSIGNED (domain))
 	      {
-		tree nref = build4 (ARRAY_REF, elemtype, ref, size_int (idx),
+		idx = wi::sext (idx, TYPE_PRECISION (domain));
+		max = wi::sext (max, TYPE_PRECISION (domain));
+	      }
+	    for (int el_off = offset; wi::les_p (idx, max); ++idx)
+	      {
+		tree nref = build4 (ARRAY_REF, elemtype,
+				    ref,
+				    wide_int_to_tree (domain, idx),
 				    NULL_TREE, NULL_TREE);
-		int el_off = offset + idx * el_size;
 		scalarize_elem (base, el_off, el_size, nref, elemtype);
+		el_off += el_size;
 	      }
-	    while (++idx <= lenp1);
 	  }
       }
       break;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-11-03 14:01   ` Richard Biener
@ 2015-11-05 18:55     ` Alan Lawrence
  2015-11-06 10:25       ` Eric Botcazou
  2015-11-06 16:29       ` Richard Biener
  0 siblings, 2 replies; 28+ messages in thread
From: Alan Lawrence @ 2015-11-05 18:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 3 November 2015 at 14:01, Richard Biener <richard.guenther@gmail.com> wrote:
>
> Hum.  I still wonder why we need all this complication ...

Well, certainly I'd love to make it simpler, and if the complication
is because I've gone about trying to deal with especially Ada in the
wrong way...

>  I would
> expect that if
> we simply decided to completely scalarize a constant pool aggregate load then
> we can always do that.  SRA should then simply emit the _same_ IL as it does
> for other complete scalarization element accesses.  The only difference is
> that the result should be simplifiable to omit the element load from
> the constant pool.
>
> 1) The FRE pass following SRA should do that for you.
...
> That is, I'd like to see the patch greatly simplified to just consider
> constant pool
> complete scalarization as if it were a regular variable.

Hmm, can you clarify, do you mean I should *not* replace constant pool
values with their DECL_INITIAL? The attempt to substitute in the
initial value is what leads to most of the problems. For example, in
gnat/opt31.adb, create_access finds this expression accessing *.LC0:

MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
<PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}

this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
far I haven't extended subst_constant_pool_initial to handle
ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:

MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
<PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
4294967295] *)&*.LC0]

because the type here has size:

MIN_EXPR <_GLOBAL.SZ2.ada_opt31 (<PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->UB0, <PLACEHOLDER_EXPR struct
opt31__messages_t___XUP>.P_BOUNDS->LB0), 17179869176>

inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
4-element array (fine). Sadly while the MEM_REF
type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
does not....

One possibility is that this whole construct, ARRAY_RANGE_REF that it
is, should mark *.LC0 in cannot_scalarize_away_bitmap.

However, disqualified_constants is also necessary if I want a
meaningful assertion that we do not re-add constant pool entries as
candidates after we've discovered them - or should I try to rely on
there only being one expression that accesses each constant pool
entry? (I don't think that's guaranteed as tree_output_constant_def
does hashing, I admit I haven't really tried to break that assumption)

> 2) You should be able to use fold_ctor_reference directly (in place of
> all your code
> in case offset and size are readily available - don't remember exactly how
> complete scalarization "walks" elements).  Alternatively use
> fold_const_aggregate_ref.

That should work for completely_scalarize, yes, i.e. if I can remove
the other route in create_access.

> 3) You can simplify the stmt SRA generated by simply calling fold_stmt on it,
> that will do a bit more (wasted) work compared to 2) but may be easier.
>
> I wouldn't bother with the case where we for some reason do not simplify
> the constant pool load.

Hmmm. As above, I'm not quite sure what you mean by "the constant pool
load" - if that means, substituting the DECL_INITIAL in place of the
VAR_DECL that is DECL_IN_CONSTANT_POOL, then I didn't find a general
tree substitution routine, hence writing subst_constant_pool_initial.
It might be possible to make that simpler/more generic (e.g. just call
copy_node, recurse on each operand, return the original if nothing
changed) and then fold at the end.

Thanks,
Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-11-05 18:55     ` Alan Lawrence
@ 2015-11-06 10:25       ` Eric Botcazou
  2015-11-06 16:29       ` Richard Biener
  1 sibling, 0 replies; 28+ messages in thread
From: Eric Botcazou @ 2015-11-06 10:25 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, Richard Biener

> Hmm, can you clarify, do you mean I should *not* replace constant pool
> values with their DECL_INITIAL? The attempt to substitute in the
> initial value is what leads to most of the problems. For example, in
> gnat/opt31.adb, create_access finds this expression accessing *.LC0:
> 
> MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
> struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
> <PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
> 4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}
> 
> this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
> far I haven't extended subst_constant_pool_initial to handle
> ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:
> 
> MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
> struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
> <PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
> 4294967295] *)&*.LC0]
> 
> because the type here has size:
> 
> MIN_EXPR <_GLOBAL.SZ2.ada_opt31 (<PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->UB0, <PLACEHOLDER_EXPR struct
> opt31__messages_t___XUP>.P_BOUNDS->LB0), 17179869176>
> 
> inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
> 4-element array (fine). Sadly while the MEM_REF
> type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
> does not....

FWIW you are allowed to punt on this kind of complex expressions that appear 
only in Ada.  New optimizations are sort of allowed to work on the C family of 
languages first, and be extended or not to the rest of languages afterwards.

> One possibility is that this whole construct, ARRAY_RANGE_REF that it
> is, should mark *.LC0 in cannot_scalarize_away_bitmap.

ARRAY_RANGE_REF is only used in Ada so you can do that for now (unless this 
introduces regressions in the gnat.dg testsuite but I doubt it).

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices
  2015-11-05 13:22       ` Alan Lawrence
@ 2015-11-06 12:30         ` Richard Biener
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Biener @ 2015-11-06 12:30 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches, Eric Botcazou

On Thu, Nov 5, 2015 at 2:22 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 30/10/15 10:54, Eric Botcazou wrote:
>> On 30/10/15 10:44, Richard Biener wrote:
>>>
>>> I think you want to use wide-ints here and
>>>
>>>     wide_int idx = wi::from (minidx, TYPE_PRECISION (TYPE_DOMAIN
>>> (...)), TYPE_SIGN (TYPE_DOMAIN (..)));
>>>     wide_int maxidx = ...
>>>
>>> you can then simply iterate minidx with ++ and do the final compare
>>> against maxidx
>>> with while (++idx <= maxidx).  For the array ref index we want to use
>>> TYPE_DOMAIN
>>> as type as well, not size_int.  Thus wide_int_to_tree (TYPE_DOMAIN (...)..idx).
> [...]
>> But using offset_int should be OK, see for example get_ref_base_and_extent.
>>
>
> Here's a patch using offset_int. (Not as easy to construct as wide_int::from,
> the sign-extend is what appeared to be done elsewhere that constructs
> offset_ints).
>
> Tested by bootstrap+check-{gcc,g++,ada,fortran} with the rest of the patchset
> (which causes an array[-1..1] to be completely scalarized, among others), on
> x86_64 and ARM.
>
> I don't have a test without all that (such would have to be in Ada, and trigger
> SRA of such an array but not involving the constant pool); is it OK without?

Ok.

Richard.

> gcc/ChangeLog:
>
>         * tree-sra.c (completely_scalarize): Properly handle negative array
>         indices using offset_int.
> ---
>  gcc/tree-sra.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index e15df1f..6168a7e 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1010,18 +1010,25 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
>         if (maxidx)
>           {
>             gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> -           /* MINIDX and MAXIDX are inclusive.  Try to avoid overflow.  */
> -           unsigned HOST_WIDE_INT lenp1 = tree_to_shwi (maxidx)
> -                                       - tree_to_shwi (minidx);
> -           unsigned HOST_WIDE_INT idx = 0;
> -           do
> +           tree domain = TYPE_DOMAIN (decl_type);
> +           /* MINIDX and MAXIDX are inclusive, and must be interpreted in
> +              DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
> +           offset_int idx = wi::to_offset (minidx);
> +           offset_int max = wi::to_offset (maxidx);
> +           if (!TYPE_UNSIGNED (domain))
>               {
> -               tree nref = build4 (ARRAY_REF, elemtype, ref, size_int (idx),
> +               idx = wi::sext (idx, TYPE_PRECISION (domain));
> +               max = wi::sext (max, TYPE_PRECISION (domain));
> +             }
> +           for (int el_off = offset; wi::les_p (idx, max); ++idx)
> +             {
> +               tree nref = build4 (ARRAY_REF, elemtype,
> +                                   ref,
> +                                   wide_int_to_tree (domain, idx),
>                                     NULL_TREE, NULL_TREE);
> -               int el_off = offset + idx * el_size;
>                 scalarize_elem (base, el_off, el_size, nref, elemtype);
> +               el_off += el_size;
>               }
> -           while (++idx <= lenp1);
>           }
>        }
>        break;
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-11-05 18:55     ` Alan Lawrence
  2015-11-06 10:25       ` Eric Botcazou
@ 2015-11-06 16:29       ` Richard Biener
  1 sibling, 0 replies; 28+ messages in thread
From: Richard Biener @ 2015-11-06 16:29 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On November 5, 2015 7:55:12 PM GMT+01:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>On 3 November 2015 at 14:01, Richard Biener
><richard.guenther@gmail.com> wrote:
>>
>> Hum.  I still wonder why we need all this complication ...
>
>Well, certainly I'd love to make it simpler, and if the complication
>is because I've gone about trying to deal with especially Ada in the
>wrong way...
>
>>  I would
>> expect that if
>> we simply decided to completely scalarize a constant pool aggregate
>load then
>> we can always do that.  SRA should then simply emit the _same_ IL as
>it does
>> for other complete scalarization element accesses.  The only
>difference is
>> that the result should be simplifiable to omit the element load from
>> the constant pool.
>>
>> 1) The FRE pass following SRA should do that for you.
>...
>> That is, I'd like to see the patch greatly simplified to just
>consider
>> constant pool
>> complete scalarization as if it were a regular variable.
>
>Hmm, can you clarify, do you mean I should *not* replace constant pool
>values with their DECL_INITIAL? The attempt to substitute in the
>initial value is what leads to most of the problems. For example, in
>gnat/opt31.adb, create_access finds this expression accessing *.LC0:
>
>MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
>struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
><PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
>4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}
>
>this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
>far I haven't extended subst_constant_pool_initial to handle
>ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:
>
>MEM[(interfaces__unsigned_8[(sizetype) <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0:<PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->UB0 >= <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype) <PLACEHOLDER_EXPR
>struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
><PLACEHOLDER_EXPR struct opt31__messages_t___XUP>.P_BOUNDS->LB0 +
>4294967295] *)&*.LC0]
>
>because the type here has size:
>
>MIN_EXPR <_GLOBAL.SZ2.ada_opt31 (<PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->UB0, <PLACEHOLDER_EXPR struct
>opt31__messages_t___XUP>.P_BOUNDS->LB0), 17179869176>
>
>inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
>4-element array (fine). Sadly while the MEM_REF
>type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
>does not....
>
>One possibility is that this whole construct, ARRAY_RANGE_REF that it
>is, should mark *.LC0 in cannot_scalarize_away_bitmap.
>
>However, disqualified_constants is also necessary if I want a
>meaningful assertion that we do not re-add constant pool entries as
>candidates after we've discovered them - or should I try to rely on
>there only being one expression that accesses each constant pool
>entry? (I don't think that's guaranteed as tree_output_constant_def
>does hashing, I admit I haven't really tried to break that assumption)

I see.

>> 2) You should be able to use fold_ctor_reference directly (in place
>of
>> all your code
>> in case offset and size are readily available - don't remember
>exactly how
>> complete scalarization "walks" elements).  Alternatively use
>> fold_const_aggregate_ref.
>
>That should work for completely_scalarize, yes, i.e. if I can remove
>the other route in create_access.
>
>> 3) You can simplify the stmt SRA generated by simply calling
>fold_stmt on it,
>> that will do a bit more (wasted) work compared to 2) but may be
>easier.
>>
>> I wouldn't bother with the case where we for some reason do not
>simplify
>> the constant pool load.
>
>Hmmm. As above, I'm not quite sure what you mean by "the constant pool
>load" - if that means, substituting the DECL_INITIAL in place of the
>VAR_DECL that is DECL_IN_CONSTANT_POOL, then I didn't find a general
>tree substitution routine, hence writing subst_constant_pool_initial.
>It might be possible to make that simpler/more generic (e.g. just call
>copy_node, recurse on each operand, return the original if nothing
>changed) and then fold at the end.

I tried to suggest creating piecewise loads from the constant pool as opposed to the aggregate one.  But I suppose the difficulty is to determine 'pieces', not their values (that followup passes and folding can handle quite efficiently).

As Eric I suggest you simply disqualify the candidate based on its type.

Richard.

>Thanks,
>Alan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] Share code from fold_array_ctor_reference with fold.
  2015-11-04 14:35     ` Alan Lawrence
@ 2015-11-06 21:19       ` Jeff Law
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Law @ 2015-11-06 21:19 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On 11/04/2015 07:35 AM, Alan Lawrence wrote:
>> s/explicitely/explicitly/  And remove the '*' from the 2nd and 3rd lines
>> of the comment.
>>
>> It looks like get_ctor_element_at_index has numerous formatting
>> problems.  In particular you didn't indent the braces across the board
>> properly.  Also check for tabs vs spaces issues please.
>
> Yes, you are quite right, I'm not quite sure how those crept in. (Well, the
> 'explicitely' I am sure was a copy-paste error but the others!). Apologies...
Yea, the explicitely was in the original.  I'd actually gone back to the 
original thinking the bogus formatting might have been in there too. 
But it looked reasonable.

>
> I already committed the offending code as r229605, but here's a patch to clean
> it up - does it look ok now?
Yes.

jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
  2015-11-12 18:36 ` Alan Lawrence
@ 2015-11-16 13:44   ` Richard Biener
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Biener @ 2015-11-16 13:44 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches, Jeff Law

On Thu, Nov 12, 2015 at 7:35 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 06/11/15 16:29, Richard Biener wrote:
>>>> 2) You should be able to use fold_ctor_reference directly (in place
>>> of
>>>> all your code
>>>> in case offset and size are readily available - don't remember
>>> exactly how
>>>> complete scalarization "walks" elements).  Alternatively use
>>>> fold_const_aggregate_ref.
>
> Well, yes I was able to remove subst_constant_pool_initial by using
> fold_ctor_reference, with some fairly strict checks about the access expressions
> found while scanning the input. I still needed to either deep-scan the constant
> value itself, or allow completely_scalarize to fail, due to Ada c460010 where
> a variable has DECL_INITIAL: {VIEW_CONVERT_EXPR<integer[2:4]>({1, 2, 3})}
> (that is, a CONSTRUCTOR whose only element is a V.C.E. of another CONSTRUCTOR).
>
> However:
>
>> I tried to suggest creating piecewise loads from the constant pool as opposed to the aggregate one.  But I suppose the difficulty is to determine 'pieces', not their values (that followup passes and folding can handle quite efficiently).
>
> I've now got this working, and it simplifies the patch massively :).
>
> We now replace e.g. a constant-pool load a = *.LC0, with a series of loads e.g.
> SR_a1 = SRlc0_1
> SR_a2 = SRlc0_2
> etc. etc. (well those wouldn't be quite the names, but you get the idea.)
>
> And then at the start of the function we initialise
> SRlc0_1 = *.LC0[0]
> SRlc0_2 = *.LC0[1]
> etc. etc.
> - I needed to use SRlc0_1 etc. variables because some of the constant-pool loads
> coming from Ada are rather more complicated than 'a = *.LC0' and substituting
> the access expression (e.g. '*.LC0[0].foo'), rather than the replacement_decl,
> into those lead to just too many verify_gimple failures.
>
> However, the initialization seems to work well enough, if I put a check into
> sra_modify_expr to avoid changing 'SRlc0_1 = *.LC0[0]' into 'SRlc0_1 = SRlc0_1'
> (!). Slightly hacky perhaps but harmless as there cannot be a statement writing
> to a scalar replacement prior to sra_modify_expr.
>
> Also I had to prevent writing scalar replacements back to the constant pool
> (gnat opt31.adb).
>
> Finally - I've left the disqualified_constants code in, but am now only using it
> for an assert...so I guess it'd be reasonable to take that out. In an ideal
> world we could do those checks only in checking builds but allocating bitmaps
> only for checking builds feels like it would make checking vs non-checking
> diverge too much.
>
> This is now passing bootstrap+check-{gcc,g++,fortran} on AArch64, and the same
> plus Ada on x64_64 and ARM, except for some additional guality failures in
> pr54970-1.c on AArch64 (tests which are already failing on ARM) - I haven't had
> any success in figuring those out yet, suggestions welcome.
>
> However, without the DOM patches, this does not fix the oiginal ssa-dom-cse-2.c
> testcase, even though the load is scalarized in esra and the individual element
> loads resolved in ccp2. I don't think I'll be able to respin those between now
> and stage 1 ending on Saturday, so hence I've had to drop the testsuite change,
> and can only / will merely describe the remaining problem in PR/63679. I'm
> now benchmarking on AArch64 to see what difference this patch makes on its own.
>
> Thoughts?

The new function needs a comment.  Otherwise, given there are no testcases,
I wonder what this does to nested constructs like multi-dimensional arrays
in a struct.

The patch looks _much_ better now though.  But I also wonder what the effects
on code-generation are for the non-reg-type load of part case.

I'd say the patch is ok with the comment added and a testcase (or two) verifying
it works as desired.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         PR target/63679
>         * tree-sra.c (disqualified_constants): New.
>         (sra_initialize): Allocate disqualified_constants.
>         (sra_deinitialize): Free disqualified_constants.
>         (disqualify_candidate): Update disqualified_constants when appropriate.
>         (create_access): Scan for constant-pool entries as we go along.
>         (scalarizable_type_p): Add check against type_contains_placeholder_p.
>         (maybe_add_sra_candidate): Allow constant-pool entries.
>         (initialize_constant_pool_replacements): New.
>         (sra_modify_expr): Avoid mangling assignments created by previous.
>         (sra_modify_function_body): Call initialize_constant_pool_replacements.
> ---
>  gcc/tree-sra.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 90 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 1d4a632..aa60f06 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -325,6 +325,10 @@ candidate (unsigned uid)
>     those which cannot be (because they are and need be used as a whole).  */
>  static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
>
> +/* Bitmap of candidates in the constant pool, which cannot be scalarized
> +   because this would produce non-constant expressions (e.g. Ada).  */
> +static bitmap disqualified_constants;
> +
>  /* Obstack for creation of fancy names.  */
>  static struct obstack name_obstack;
>
> @@ -647,6 +651,7 @@ sra_initialize (void)
>      (vec_safe_length (cfun->local_decls) / 2);
>    should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>    cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
> +  disqualified_constants = BITMAP_ALLOC (NULL);
>    gcc_obstack_init (&name_obstack);
>    base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>    memset (&sra_stats, 0, sizeof (sra_stats));
> @@ -665,6 +670,7 @@ sra_deinitialize (void)
>    candidates = NULL;
>    BITMAP_FREE (should_scalarize_away_bitmap);
>    BITMAP_FREE (cannot_scalarize_away_bitmap);
> +  BITMAP_FREE (disqualified_constants);
>    access_pool.release ();
>    assign_link_pool.release ();
>    obstack_free (&name_obstack, NULL);
> @@ -679,6 +685,8 @@ disqualify_candidate (tree decl, const char *reason)
>  {
>    if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
>      candidates->remove_elt_with_hash (decl, DECL_UID (decl));
> +  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
> +    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -830,8 +838,11 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
>    return access;
>  }
>
> +static bool maybe_add_sra_candidate (tree);
> +
>  /* Create and insert access for EXPR. Return created access, or NULL if it is
> -   not possible.  */
> +   not possible.  Also scan for uses of constant pool as we go along and add
> +   to candidates.  */
>
>  static struct access *
>  create_access (tree expr, gimple *stmt, bool write)
> @@ -854,6 +865,25 @@ create_access (tree expr, gimple *stmt, bool write)
>    else
>      ptr = false;
>
> +  /* For constant-pool entries, check we can substitute the constant value.  */
> +  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
> +      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
> +    {
> +      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
> +      if (expr != base
> +         && !is_gimple_reg_type (TREE_TYPE (expr))
> +         && dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         /* This occurs in Ada with accesses to ARRAY_RANGE_REFs,
> +            and elements of multidimensional arrays (which are
> +            multi-element arrays in their own right).  */
> +         fprintf (dump_file, "Allowing non-reg-type load of part"
> +                             " of constant-pool entry: ");
> +         print_generic_expr (dump_file, expr, 0);
> +       }
> +      maybe_add_sra_candidate (base);
> +    }
> +
>    if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>      return NULL;
>
> @@ -912,6 +942,8 @@ static bool
>  scalarizable_type_p (tree type)
>  {
>    gcc_assert (!is_gimple_reg_type (type));
> +  if (type_contains_placeholder_p (type))
> +    return false;
>
>    switch (TREE_CODE (type))
>    {
> @@ -1832,7 +1864,12 @@ maybe_add_sra_candidate (tree var)
>        reject (var, "not aggregate");
>        return false;
>      }
> -  if (needs_to_live_in_memory (var))
> +  /* Allow constant-pool entries (that "need to live in memory")
> +     unless we are doing IPA SRA.  */
> +  if (needs_to_live_in_memory (var)
> +      && (sra_mode == SRA_MODE_EARLY_IPA
> +         || TREE_CODE (var) != VAR_DECL
> +         || !DECL_IN_CONSTANT_POOL (var)))
>      {
>        reject (var, "needs to live in memory");
>        return false;
> @@ -3250,6 +3287,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>    racc = get_access_for_expr (rhs);
>    if (!lacc && !racc)
>      return SRA_AM_NONE;
> +  /* Avoid modifying initializations of constant-pool replacements.  */
> +  if (racc && (racc->replacement_decl == lhs))
> +    return SRA_AM_NONE;
>
>    loc = gimple_location (stmt);
>    if (lacc && lacc->grp_to_be_replaced)
> @@ -3366,7 +3406,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>        || contains_vce_or_bfcref_p (lhs)
>        || stmt_ends_bb_p (stmt))
>      {
> -      if (access_has_children_p (racc))
> +      /* No need to copy into a constant-pool, it comes pre-initialized.  */
> +      if (access_has_children_p (racc)
> +         && (TREE_CODE (racc->base) != VAR_DECL
> +             || !DECL_IN_CONSTANT_POOL (racc->base)))
>         generate_subtree_copies (racc->first_child, rhs, racc->offset, 0, 0,
>                                  gsi, false, false, loc);
>        if (access_has_children_p (lacc))
> @@ -3469,6 +3512,48 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>      }
>  }
>
> +static void
> +initialize_constant_pool_replacements (void)
> +{
> +  gimple_seq seq = NULL;
> +  gimple_stmt_iterator gsi = gsi_start (seq);
> +  bitmap_iterator bi;
> +  unsigned i;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
> +    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
> +       && !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
> +      {
> +       tree var = candidate (i);
> +       if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
> +         continue;
> +       vec<access_p> *access_vec = get_base_access_vector (var);
> +       if (!access_vec)
> +         continue;
> +       for (unsigned i = 0; i < access_vec->length (); i++)
> +         {
> +           struct access *access = (*access_vec)[i];
> +           if (!access->replacement_decl)
> +             continue;
> +           gassign *stmt = gimple_build_assign (
> +             get_access_replacement (access), unshare_expr (access->expr));
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             {
> +               fprintf (dump_file, "Generating constant initializer: ");
> +               print_gimple_stmt (dump_file, stmt, 0, 1);
> +               fprintf (dump_file, "\n");
> +             }
> +           gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> +           update_stmt (stmt);
> +         }
> +      }
> +
> +  seq = gsi_seq (gsi);
> +  if (seq)
> +    gsi_insert_seq_on_edge_immediate (
> +      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
> +}
> +
>  /* Traverse the function body and all modifications as decided in
>     analyze_all_variable_accesses.  Return true iff the CFG has been
>     changed.  */
> @@ -3479,6 +3564,8 @@ sra_modify_function_body (void)
>    bool cfg_changed = false;
>    basic_block bb;
>
> +  initialize_constant_pool_replacements ();
> +
>    FOR_EACH_BB_FN (bb, cfun)
>      {
>        gimple_stmt_iterator gsi = gsi_start_bb (bb);
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] Make SRA replace constant-pool loads
       [not found] <=<F9F317CB-3AA4-4583-95D6-72A3E3ABED2F@gmail.com>
@ 2015-11-12 18:36 ` Alan Lawrence
  2015-11-16 13:44   ` Richard Biener
  0 siblings, 1 reply; 28+ messages in thread
From: Alan Lawrence @ 2015-11-12 18:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther, law

On 06/11/15 16:29, Richard Biener wrote:
>>> 2) You should be able to use fold_ctor_reference directly (in place
>> of
>>> all your code
>>> in case offset and size are readily available - don't remember
>> exactly how
>>> complete scalarization "walks" elements).  Alternatively use
>>> fold_const_aggregate_ref.

Well, yes I was able to remove subst_constant_pool_initial by using
fold_ctor_reference, with some fairly strict checks about the access expressions
found while scanning the input. I still needed to either deep-scan the constant
value itself, or allow completely_scalarize to fail, due to Ada c460010 where
a variable has DECL_INITIAL: {VIEW_CONVERT_EXPR<integer[2:4]>({1, 2, 3})}
(that is, a CONSTRUCTOR whose only element is a V.C.E. of another CONSTRUCTOR).

However:

> I tried to suggest creating piecewise loads from the constant pool as opposed to the aggregate one.  But I suppose the difficulty is to determine 'pieces', not their values (that followup passes and folding can handle quite efficiently).

I've now got this working, and it simplifies the patch massively :).

We now replace e.g. a constant-pool load a = *.LC0, with a series of loads e.g.
SR_a1 = SRlc0_1
SR_a2 = SRlc0_2
etc. etc. (well those wouldn't be quite the names, but you get the idea.)

And then at the start of the function we initialise
SRlc0_1 = *.LC0[0]
SRlc0_2 = *.LC0[1]
etc. etc.
- I needed to use SRlc0_1 etc. variables because some of the constant-pool loads
coming from Ada are rather more complicated than 'a = *.LC0' and substituting
the access expression (e.g. '*.LC0[0].foo'), rather than the replacement_decl,
into those lead to just too many verify_gimple failures.

However, the initialization seems to work well enough, if I put a check into
sra_modify_expr to avoid changing 'SRlc0_1 = *.LC0[0]' into 'SRlc0_1 = SRlc0_1'
(!). Slightly hacky perhaps but harmless as there cannot be a statement writing
to a scalar replacement prior to sra_modify_expr.

Also I had to prevent writing scalar replacements back to the constant pool
(gnat opt31.adb).

Finally - I've left the disqualified_constants code in, but am now only using it
for an assert...so I guess it'd be reasonable to take that out. In an ideal
world we could do those checks only in checking builds but allocating bitmaps
only for checking builds feels like it would make checking vs non-checking
diverge too much.

This is now passing bootstrap+check-{gcc,g++,fortran} on AArch64, and the same
plus Ada on x64_64 and ARM, except for some additional guality failures in
pr54970-1.c on AArch64 (tests which are already failing on ARM) - I haven't had
any success in figuring those out yet, suggestions welcome.

However, without the DOM patches, this does not fix the oiginal ssa-dom-cse-2.c
testcase, even though the load is scalarized in esra and the individual element
loads resolved in ccp2. I don't think I'll be able to respin those between now
and stage 1 ending on Saturday, so hence I've had to drop the testsuite change,
and can only / will merely describe the remaining problem in PR/63679. I'm
now benchmarking on AArch64 to see what difference this patch makes on its own.

Thoughts?

gcc/ChangeLog:

	PR target/63679
	* tree-sra.c (disqualified_constants): New.
	(sra_initialize): Allocate disqualified_constants.
	(sra_deinitialize): Free disqualified_constants.
	(disqualify_candidate): Update disqualified_constants when appropriate.
	(create_access): Scan for constant-pool entries as we go along.
	(scalarizable_type_p): Add check against type_contains_placeholder_p.
	(maybe_add_sra_candidate): Allow constant-pool entries.
	(initialize_constant_pool_replacements): New.
	(sra_modify_expr): Avoid mangling assignments created by previous.
	(sra_modify_function_body): Call initialize_constant_pool_replacements.
---
 gcc/tree-sra.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 90 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 1d4a632..aa60f06 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -325,6 +325,10 @@ candidate (unsigned uid)
    those which cannot be (because they are and need be used as a whole).  */
 static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
 
+/* Bitmap of candidates in the constant pool, which cannot be scalarized
+   because this would produce non-constant expressions (e.g. Ada).  */
+static bitmap disqualified_constants;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -647,6 +651,7 @@ sra_initialize (void)
     (vec_safe_length (cfun->local_decls) / 2);
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
+  disqualified_constants = BITMAP_ALLOC (NULL);
   gcc_obstack_init (&name_obstack);
   base_access_vec = new hash_map<tree, auto_vec<access_p> >;
   memset (&sra_stats, 0, sizeof (sra_stats));
@@ -665,6 +670,7 @@ sra_deinitialize (void)
   candidates = NULL;
   BITMAP_FREE (should_scalarize_away_bitmap);
   BITMAP_FREE (cannot_scalarize_away_bitmap);
+  BITMAP_FREE (disqualified_constants);
   access_pool.release ();
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);
@@ -679,6 +685,8 @@ disqualify_candidate (tree decl, const char *reason)
 {
   if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
     candidates->remove_elt_with_hash (decl, DECL_UID (decl));
+  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
+    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -830,8 +838,11 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
   return access;
 }
 
+static bool maybe_add_sra_candidate (tree);
+
 /* Create and insert access for EXPR. Return created access, or NULL if it is
-   not possible.  */
+   not possible.  Also scan for uses of constant pool as we go along and add
+   to candidates.  */
 
 static struct access *
 create_access (tree expr, gimple *stmt, bool write)
@@ -854,6 +865,25 @@ create_access (tree expr, gimple *stmt, bool write)
   else
     ptr = false;
 
+  /* For constant-pool entries, check we can substitute the constant value.  */
+  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
+      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
+    {
+      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
+      if (expr != base
+	  && !is_gimple_reg_type (TREE_TYPE (expr))
+	  && dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  /* This occurs in Ada with accesses to ARRAY_RANGE_REFs,
+	     and elements of multidimensional arrays (which are
+	     multi-element arrays in their own right).  */
+	  fprintf (dump_file, "Allowing non-reg-type load of part"
+			      " of constant-pool entry: ");
+	  print_generic_expr (dump_file, expr, 0);
+	}
+      maybe_add_sra_candidate (base);
+    }
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
     return NULL;
 
@@ -912,6 +942,8 @@ static bool
 scalarizable_type_p (tree type)
 {
   gcc_assert (!is_gimple_reg_type (type));
+  if (type_contains_placeholder_p (type))
+    return false;
 
   switch (TREE_CODE (type))
   {
@@ -1832,7 +1864,12 @@ maybe_add_sra_candidate (tree var)
       reject (var, "not aggregate");
       return false;
     }
-  if (needs_to_live_in_memory (var))
+  /* Allow constant-pool entries (that "need to live in memory")
+     unless we are doing IPA SRA.  */
+  if (needs_to_live_in_memory (var)
+      && (sra_mode == SRA_MODE_EARLY_IPA
+	  || TREE_CODE (var) != VAR_DECL
+	  || !DECL_IN_CONSTANT_POOL (var)))
     {
       reject (var, "needs to live in memory");
       return false;
@@ -3250,6 +3287,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
   racc = get_access_for_expr (rhs);
   if (!lacc && !racc)
     return SRA_AM_NONE;
+  /* Avoid modifying initializations of constant-pool replacements.  */
+  if (racc && (racc->replacement_decl == lhs))
+    return SRA_AM_NONE;
 
   loc = gimple_location (stmt);
   if (lacc && lacc->grp_to_be_replaced)
@@ -3366,7 +3406,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
       || contains_vce_or_bfcref_p (lhs)
       || stmt_ends_bb_p (stmt))
     {
-      if (access_has_children_p (racc))
+      /* No need to copy into a constant-pool, it comes pre-initialized.  */
+      if (access_has_children_p (racc)
+	  && (TREE_CODE (racc->base) != VAR_DECL
+	      || !DECL_IN_CONSTANT_POOL (racc->base)))
 	generate_subtree_copies (racc->first_child, rhs, racc->offset, 0, 0,
 				 gsi, false, false, loc);
       if (access_has_children_p (lacc))
@@ -3469,6 +3512,48 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
     }
 }
 
+static void
+initialize_constant_pool_replacements (void)
+{
+  gimple_seq seq = NULL;
+  gimple_stmt_iterator gsi = gsi_start (seq);
+  bitmap_iterator bi;
+  unsigned i;
+
+  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
+    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
+	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
+      {
+	tree var = candidate (i);
+	if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
+	  continue;
+	vec<access_p> *access_vec = get_base_access_vector (var);
+	if (!access_vec)
+	  continue;
+	for (unsigned i = 0; i < access_vec->length (); i++)
+	  {
+	    struct access *access = (*access_vec)[i];
+	    if (!access->replacement_decl)
+	      continue;
+	    gassign *stmt = gimple_build_assign (
+	      get_access_replacement (access), unshare_expr (access->expr));
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Generating constant initializer: ");
+		print_gimple_stmt (dump_file, stmt, 0, 1);
+		fprintf (dump_file, "\n");
+	      }
+	    gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	    update_stmt (stmt);
+	  }
+      }
+
+  seq = gsi_seq (gsi);
+  if (seq)
+    gsi_insert_seq_on_edge_immediate (
+      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
+}
+
 /* Traverse the function body and all modifications as decided in
    analyze_all_variable_accesses.  Return true iff the CFG has been
    changed.  */
@@ -3479,6 +3564,8 @@ sra_modify_function_body (void)
   bool cfg_changed = false;
   basic_block bb;
 
+  initialize_constant_pool_replacements ();
+
   FOR_EACH_BB_FN (bb, cfun)
     {
       gimple_stmt_iterator gsi = gsi_start_bb (bb);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-11-16 13:44 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-29 19:22 [PATCH 0/6 v2] PR/63679 Make SRA scalarize constant-pool loads Alan Lawrence
2015-10-29 19:20 ` [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF Alan Lawrence
2015-10-30  5:39   ` Jeff Law
2015-10-30  9:21     ` Richard Biener
2015-11-03 10:27     ` Alan Lawrence
2015-11-03 11:13       ` Alan Lawrence
2015-11-03 11:49         ` Richard Biener
2015-10-29 19:21 ` [PATCH 2/6] tree-ssa-dom.c: Normalize data types in MEM_REFs Alan Lawrence
2015-10-29 19:22 ` [PATCH 3/6] Share code from fold_array_ctor_reference with fold Alan Lawrence
2015-10-30  9:25   ` Richard Biener
2015-11-03  3:20   ` Jeff Law
2015-11-04 14:35     ` Alan Lawrence
2015-11-06 21:19       ` Jeff Law
2015-10-29 19:22 ` [PATCH 4/6][Trivial] tree-sra.c: A few comment fixes/additions Alan Lawrence
2015-10-30  5:35   ` Jeff Law
2015-10-29 19:22 ` [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices Alan Lawrence
2015-10-30 10:54   ` Richard Biener
2015-10-30 10:55     ` Eric Botcazou
2015-11-05 13:22       ` Alan Lawrence
2015-11-06 12:30         ` Richard Biener
2015-10-29 19:32 ` [PATCH 6/6] Make SRA replace constant-pool loads Alan Lawrence
2015-10-31 23:48   ` Bernhard Reutner-Fischer
2015-11-03 14:01   ` Richard Biener
2015-11-05 18:55     ` Alan Lawrence
2015-11-06 10:25       ` Eric Botcazou
2015-11-06 16:29       ` Richard Biener
     [not found] <=<F9F317CB-3AA4-4583-95D6-72A3E3ABED2F@gmail.com>
2015-11-12 18:36 ` Alan Lawrence
2015-11-16 13:44   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).