public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/4] Make SRA scalarize constant-pool loads
  2015-12-21 13:14 [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
@ 2015-12-21 13:14 ` Alan Lawrence
  2015-12-24 11:53   ` Alan Lawrence
  2015-12-21 13:14 ` [PATCH 3/4] Add a pass_copy_prop following pass_ch_vect Alan Lawrence
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2015-12-21 13:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: james.greenhalgh, marcus.shawcroft, richard.earnshaw

This is the same as the version conditionally approved near the end of stage 1
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01948.html, plus the
requested comment and a couple of testcases for platforms where the constants
are pushed into the constant pool. (I didn't commit this at the time because
without the rest of the series, benchmarking results didn't really show any
improvement, and the PR was not fixed.)

I've left the disqualified_constants check in, although I could be persuaded to
remove it if people felt strongly.

Also on AArch64, this still causes FAILs in guality pr54970.c:
FAIL: gcc.dg/guality/pr54970.c   -Os  line 15 a[0] == 1
FAIL: gcc.dg/guality/pr54970.c   -Os  line 20 a[0] == 1
FAIL: gcc.dg/guality/pr54970.c   -Os  line 25 a[0] == 1
and I think I must ask the AArch64 maintainers to take these for the time being.

Bootstrapped + check-gcc,g++,ada on x86_64, ARM, AArch64.

gcc/ChangeLog:

	PR target/63679
	* tree-sra.c (disqualified_constants): New.
	(sra_initialize): Allocate disqualified_constants.
	(sra_deinitialize): Free disqualified_constants.
	(disqualify_candidate): Update disqualified_constants when appropriate.
	(create_access): Scan for constant-pool entries as we go along.
	(scalarizable_type_p): Add check against type_contains_placeholder_p.
	(maybe_add_sra_candidate): Allow constant-pool entries.
	(initialize_constant_pool_replacements): New.
	(sra_modify_expr): Avoid mangling assignments created by previous.
	(sra_modify_function_body): Call initialize_constant_pool_replacements.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/sra-17.c: New.
	* gcc.dg/tree-ssa/sra-18.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/sra-17.c | 19 +++++++
 gcc/testsuite/gcc.dg/tree-ssa/sra-18.c | 28 ++++++++++
 gcc/tree-sra.c                         | 99 ++++++++++++++++++++++++++++++++--
 3 files changed, 143 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-18.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
new file mode 100644
index 0000000..cff868a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */
+/* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=16" } */
+
+extern void abort (void);
+
+int
+main (int argc, char **argv)
+{
+  int a[4] = { 7, 19, 11, 255 };
+  int tot = 0;
+  for (int i = 0; i < 4; i++)
+    tot = (tot*256) + a[i];
+  if (tot == 0x07130bff)
+    return 0;
+  abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\\[" 4 "esra" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
new file mode 100644
index 0000000..6c0d52b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */
+/* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=16" } */
+
+extern void abort (void);
+struct foo { int x; };
+
+struct bar { struct foo f[2]; };
+
+struct baz { struct bar b[2]; };
+
+int
+main (int argc, char **argv)
+{
+  struct baz a = { { { { { 4 }, { 5 } } }, { { { 6 }, { 7 } } }  } };
+  int tot = 0;
+  for (int i = 0; i < 2; i++)
+    for (int j = 0; j < 2; j++)
+      tot = (tot*256) + a.b[i].f[j].x;
+  if (tot == 0x04050607)
+    return 0;
+  abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[0\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[0\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[1\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[1\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index c4fea5b..4ea8550 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -328,6 +328,10 @@ candidate (unsigned uid)
    those which cannot be (because they are and need be used as a whole).  */
 static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
 
+/* Bitmap of candidates in the constant pool, which cannot be scalarized
+   because this would produce non-constant expressions (e.g. Ada).  */
+static bitmap disqualified_constants;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -652,6 +656,7 @@ sra_initialize (void)
     (vec_safe_length (cfun->local_decls) / 2);
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
+  disqualified_constants = BITMAP_ALLOC (NULL);
   gcc_obstack_init (&name_obstack);
   base_access_vec = new hash_map<tree, auto_vec<access_p> >;
   memset (&sra_stats, 0, sizeof (sra_stats));
@@ -670,6 +675,7 @@ sra_deinitialize (void)
   candidates = NULL;
   BITMAP_FREE (should_scalarize_away_bitmap);
   BITMAP_FREE (cannot_scalarize_away_bitmap);
+  BITMAP_FREE (disqualified_constants);
   access_pool.release ();
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);
@@ -684,6 +690,8 @@ disqualify_candidate (tree decl, const char *reason)
 {
   if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
     candidates->remove_elt_with_hash (decl, DECL_UID (decl));
+  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
+    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -835,8 +843,11 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
   return access;
 }
 
+static bool maybe_add_sra_candidate (tree);
+
 /* Create and insert access for EXPR. Return created access, or NULL if it is
-   not possible.  */
+   not possible.  Also scan for uses of constant pool as we go along and add
+   to candidates.  */
 
 static struct access *
 create_access (tree expr, gimple *stmt, bool write)
@@ -859,6 +870,25 @@ create_access (tree expr, gimple *stmt, bool write)
   else
     ptr = false;
 
+  /* For constant-pool entries, check we can substitute the constant value.  */
+  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
+      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
+    {
+      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
+      if (expr != base
+	  && !is_gimple_reg_type (TREE_TYPE (expr))
+	  && dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  /* This occurs in Ada with accesses to ARRAY_RANGE_REFs,
+	     and elements of multidimensional arrays (which are
+	     multi-element arrays in their own right).  */
+	  fprintf (dump_file, "Allowing non-reg-type load of part"
+			      " of constant-pool entry: ");
+	  print_generic_expr (dump_file, expr, 0);
+	}
+      maybe_add_sra_candidate (base);
+    }
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
     return NULL;
 
@@ -918,6 +948,8 @@ static bool
 scalarizable_type_p (tree type)
 {
   gcc_assert (!is_gimple_reg_type (type));
+  if (type_contains_placeholder_p (type))
+    return false;
 
   switch (TREE_CODE (type))
   {
@@ -1852,7 +1884,12 @@ maybe_add_sra_candidate (tree var)
       reject (var, "not aggregate");
       return false;
     }
-  if (needs_to_live_in_memory (var))
+  /* Allow constant-pool entries (that "need to live in memory")
+     unless we are doing IPA SRA.  */
+  if (needs_to_live_in_memory (var)
+      && (sra_mode == SRA_MODE_EARLY_IPA
+	  || TREE_CODE (var) != VAR_DECL
+	  || !DECL_IN_CONSTANT_POOL (var)))
     {
       reject (var, "needs to live in memory");
       return false;
@@ -3272,6 +3309,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
   racc = get_access_for_expr (rhs);
   if (!lacc && !racc)
     return SRA_AM_NONE;
+  /* Avoid modifying initializations of constant-pool replacements.  */
+  if (racc && (racc->replacement_decl == lhs))
+    return SRA_AM_NONE;
 
   loc = gimple_location (stmt);
   if (lacc && lacc->grp_to_be_replaced)
@@ -3388,7 +3428,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
       || contains_vce_or_bfcref_p (lhs)
       || stmt_ends_bb_p (stmt))
     {
-      if (access_has_children_p (racc))
+      /* No need to copy into a constant-pool, it comes pre-initialized.  */
+      if (access_has_children_p (racc)
+	  && (TREE_CODE (racc->base) != VAR_DECL
+	      || !DECL_IN_CONSTANT_POOL (racc->base)))
 	generate_subtree_copies (racc->first_child, rhs, racc->offset, 0, 0,
 				 gsi, false, false, loc);
       if (access_has_children_p (lacc))
@@ -3491,6 +3534,54 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
     }
 }
 
+/* Set any scalar replacements of values in the constant pool to the initial
+   value of the constant.  (Constant-pool decls like *.LC0 have effectively
+   been initialized before the program starts, we must do the same for their
+   replacements.)  Thus, we output statements like 'SR.1 = *.LC0[0];' into
+   the function's entry block.  */
+
+static void
+initialize_constant_pool_replacements (void)
+{
+  gimple_seq seq = NULL;
+  gimple_stmt_iterator gsi = gsi_start (seq);
+  bitmap_iterator bi;
+  unsigned i;
+
+  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
+    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
+	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
+      {
+	tree var = candidate (i);
+	if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
+	  continue;
+	vec<access_p> *access_vec = get_base_access_vector (var);
+	if (!access_vec)
+	  continue;
+	for (unsigned i = 0; i < access_vec->length (); i++)
+	  {
+	    struct access *access = (*access_vec)[i];
+	    if (!access->replacement_decl)
+	      continue;
+	    gassign *stmt = gimple_build_assign (
+	      get_access_replacement (access), unshare_expr (access->expr));
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Generating constant initializer: ");
+		print_gimple_stmt (dump_file, stmt, 0, 1);
+		fprintf (dump_file, "\n");
+	      }
+	    gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	    update_stmt (stmt);
+	  }
+      }
+
+  seq = gsi_seq (gsi);
+  if (seq)
+    gsi_insert_seq_on_edge_immediate (
+      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
+}
+
 /* Traverse the function body and all modifications as decided in
    analyze_all_variable_accesses.  Return true iff the CFG has been
    changed.  */
@@ -3501,6 +3592,8 @@ sra_modify_function_body (void)
   bool cfg_changed = false;
   basic_block bb;
 
+  initialize_constant_pool_replacements ();
+
   FOR_EACH_BB_FN (bb, cfun)
     {
       gimple_stmt_iterator gsi = gsi_start_bb (bb);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 3/4] Add a pass_copy_prop following pass_ch_vect
  2015-12-21 13:14 [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
  2015-12-21 13:14 ` [PATCH 1/4] Make SRA scalarize constant-pool loads Alan Lawrence
@ 2015-12-21 13:14 ` Alan Lawrence
  2015-12-26 17:58   ` Richard Biener
  2015-12-21 13:14 ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Alan Lawrence
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2015-12-21 13:14 UTC (permalink / raw)
  To: gcc-patches

This fixes the missed vectorization of gcc.dg/vect/pr65947-2.c following the
previous patch, by cleaning up:

  i_49 = 1;
  ...
    goto <bb 3>;

  <bb 3>:

  <bb 4>: <--- LOOP HEADER
  # i_53 = PHI <i_42(7), i_49(3)>

into:

<bb 3>:
  # i_53 = PHI <i_42(4), 1(2)>

Allowing scalar evolution and vectorization to proceed.

A similar sequence of partial peeling, header-copying, and constant propagation
also leads to pr65947-9.c successfully vectorizing (it previously didn't as
the iteration count was too high) when inlined and so using known data.

I'm not sure whether adding a pass_copy_prop is the right thing here, but since
loop-header-copying can create such opportunities, it seems good to take them!

gcc/ChangeLog:

	* passes.def: Add a pass_copy_prop following pass_ch_vect.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr65947-9.c: Add __attribute__ ((noinline)).

--Alan
---
 gcc/passes.def                        | 1 +
 gcc/testsuite/gcc.dg/vect/pr65947-9.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 624d121..1043548 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -269,6 +269,7 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_expand_omp_ssa);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_ch_vect);
+	  NEXT_PASS (pass_copy_prop);
 	  NEXT_PASS (pass_if_conversion);
 	  /* pass_vectorize must immediately follow pass_if_conversion.
 	     Please do not add any other passes in between.  */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-9.c b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
index d0da13f..c7aac8d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-9.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
@@ -7,7 +7,7 @@ extern void abort (void) __attribute__ ((noreturn));
 /* Condition reduction with maximum possible loop size.  Will fail to
    vectorize because the vectorisation requires a slot for default values.  */
 
-char
+__attribute__ ((noinline)) char
 condition_reduction (char *a, char min_v)
 {
   char last = -72;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified
@ 2015-12-21 13:14 Alan Lawrence
  2015-12-21 13:14 ` [PATCH 1/4] Make SRA scalarize constant-pool loads Alan Lawrence
                   ` (4 more replies)
  0 siblings, 5 replies; 48+ messages in thread
From: Alan Lawrence @ 2015-12-21 13:14 UTC (permalink / raw)
  To: gcc-patches

This is a respin of previous patch series:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03271.html
Minus three of the smaller patches having already been committed; with the
updated version of the main patch to SRA; and the patches to DOM reworked
to avoid constructing gimple stmt's.

IMHO this feels quite invasive for stage 3, however, I note the PR/63679
(ssa-dom-cse-2.c being XFAILed on a bunch of platforms, and currently still
FAILing on ARM) is a regression from gcc 4.9.1. So I'd ask maintainer's thoughts
as to whether we should take this patch set for gcc 6.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2015-12-21 13:14 [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
  2015-12-21 13:14 ` [PATCH 1/4] Make SRA scalarize constant-pool loads Alan Lawrence
  2015-12-21 13:14 ` [PATCH 3/4] Add a pass_copy_prop following pass_ch_vect Alan Lawrence
@ 2015-12-21 13:14 ` Alan Lawrence
  2015-12-24 11:55   ` Alan Lawrence
  2016-01-04 19:08   ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Jeff Law
  2015-12-21 13:22 ` [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms Alan Lawrence
  2015-12-22 15:54 ` [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
  4 siblings, 2 replies; 48+ messages in thread
From: Alan Lawrence @ 2015-12-21 13:14 UTC (permalink / raw)
  To: gcc-patches

This is a respin of patches
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03266.html and
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03267.html, which were
"too quickly" approved before concerns with efficiency were pointed out.

I tried to change the hashing just in tree-ssa-dom.c using C++ subclassing, but
couldn't cleanly separate this out from tree-ssa-scopedtables and
tree-ssa-threadedge.c due to use of avail_exprs_stack. So I figured it was
probably appropriate to use the equivalences in jump threading too. Also,
using get_ref_base_and_extent unifies handling of MEM_REFs and ARRAY_REFs
(hence only one patch rather than two).

I've added a couple of testcases that show the improvement in DOM, but in all
cases I had to disable FRE, even PRE, to get any improvement, apart from on
ssa-dom-cse-2.c itself (where on the affected platforms FRE still does not do
the optimization). This makes me wonder if this is the right approach or whether
changing the references output by SRA (as per
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01490.html , judged as a hack to
SRA to work around limitations in DOM - or is it?) would be better.

Also, this causes gcc to miss vectorization of pr65947-2.c, because the
additional equivalents spotted in dom1 lead to an extra jump-thread (partially
peeling the loop's first iter, as this has become entirely predictable);
loop-header-copying (ch_vect) then kicks in, restoring the loop structure but
leading to:

      i_49 = 1;
      ivtmp_46 = 253;
      if (ivtmp_46 != 0) goto <bb 3>; else goto <bb 8>;

      <bb 3>: <--- OUTSIDE LOOP

      <bb 4>: <--- LOOP HEADER
      # i_53 = PHI <i_42(7), i_49(3)>

and scalar evolution is unable to analyze i_49 (defined outside the loop) even
though it's equal to a constant (analyze_scalar_evolution_1, test of
flow_bb_inside_loop_p).

This is fixed in the next patch...

gcc/ChangeLog:

	* tree-ssa-scopedtables.c (avail_expr_hash): Hash MEM_REF and ARRAY_REF
	using get_ref_base_and_extent.
	(equal_mem_array_ref_p): New.
	(hashable_expr_equal_p): Add call to previous.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-cse-5.c: New.
	* gcc.dg/tree-ssa/ssa-dom-cse-6.c: New.
	* gcc.dg/tree-ssa/ssa-dom-cse-7.c: New.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c | 18 +++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c | 16 +++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c | 19 ++++++++
 gcc/tree-ssa-scopedtables.c                   | 67 +++++++++++++++++++++++++--
 4 files changed, 117 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
new file mode 100644
index 0000000..cd38d3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
@@ -0,0 +1,18 @@
+/* Test normalization of ARRAY_REF expressions to MEM_REFs in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-fre -fdump-tree-dom2" } */
+
+#define N 8
+
+int
+main (int argc, char **argv)
+{
+  int a[N];
+  for (int i = 0; i < N; i++)
+    a[i] = 2*i + 1;
+  int *p = &a[0];
+  __builtin_printf ("%d\n", a[argc]);
+  return *(++p);
+}
+
+/* { dg-final { scan-tree-dump-times "return 3;" 1 "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
new file mode 100644
index 0000000..b0c5138
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
@@ -0,0 +1,16 @@
+/* Test normalization of ARRAY_REF expressions to MEM_REFs in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-fre -fdump-tree-dom1" } */
+
+int
+main (int argc, char **argv)
+{
+  union {
+    int a[8];
+    int b[2];
+  } u = { .a = { 1, 42, 3, 4, 5, 6, 7, 8 } };
+  __builtin_printf ("%d\n", u.a[argc]);
+  return u.b[1];
+}
+
+/* { dg-final { scan-assembler-times "return 42;" 1 "dom1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c
new file mode 100644
index 0000000..1fc4c5e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c
@@ -0,0 +1,19 @@
+/* Test normalization of MEM_REF expressions in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
+
+typedef struct {
+  int a[8];
+} foo;
+
+foo f;
+
+int
+test ()
+{
+  foo g = { { 1, 2, 3, 4, 5, 6, 7, 8 } };
+  f=g;
+  return f.a[2];
+}
+
+/* { dg-final { scan-tree-dump-times "return 3;" 1 "optimized" } } */
diff --git a/gcc/tree-ssa-scopedtables.c b/gcc/tree-ssa-scopedtables.c
index 6034f79..8df7125 100644
--- a/gcc/tree-ssa-scopedtables.c
+++ b/gcc/tree-ssa-scopedtables.c
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "tree-eh.h"
 #include "internal-fn.h"
+#include "tree-dfa.h"
 
 static bool hashable_expr_equal_p (const struct hashable_expr *,
 				   const struct hashable_expr *);
@@ -209,11 +210,70 @@ avail_expr_hash (class expr_hash_elt *p)
   const struct hashable_expr *expr = p->expr ();
   inchash::hash hstate;
 
+  if (expr->kind == EXPR_SINGLE)
+    {
+      /* T could potentially be a switch index or a goto dest.  */
+      tree t = expr->ops.single.rhs;
+      if (TREE_CODE (t) == MEM_REF || TREE_CODE (t) == ARRAY_REF)
+	{
+	  /* Make equivalent statements of both these kinds hash together.
+	     Dealing with both MEM_REF and ARRAY_REF allows us not to care
+	     about equivalence with other statements not considered here.  */
+	  bool reverse;
+	  HOST_WIDE_INT offset, size, max_size;
+	  tree base = get_ref_base_and_extent (t, &offset, &size, &max_size,
+					       &reverse);
+	  /* Strictly, we could try to normalize variable-sized accesses too,
+	    but here we just deal with the common case.  */
+	  if (size == max_size)
+	    {
+	      enum tree_code code = MEM_REF;
+	      hstate.add_object (code);
+	      inchash::add_expr (base, hstate);
+	      hstate.add_object (offset);
+	      hstate.add_object (size);
+	      return hstate.end ();
+	    }
+	}
+    }
+
   inchash::add_hashable_expr (expr, hstate);
 
   return hstate.end ();
 }
 
+/* Compares trees T0 and T1 to see if they are MEM_REF or ARRAY_REFs equivalent
+   to each other.  (That is, they return the value of the same bit of memory.)
+
+   Return TRUE if the two are so equivalent; FALSE if not (which could still
+   mean the two are equivalent by other means).  */
+
+static bool
+equal_mem_array_ref_p (tree t0, tree t1)
+{
+  if (TREE_CODE (t0) != MEM_REF && TREE_CODE (t0) != ARRAY_REF)
+    return false;
+  if (TREE_CODE (t1) != MEM_REF && TREE_CODE (t1) != ARRAY_REF)
+    return false;
+
+  if (!types_compatible_p (TREE_TYPE (t0), TREE_TYPE (t1)))
+    return false;
+  bool rev0;
+  HOST_WIDE_INT off0, sz0, max0;
+  tree base0 = get_ref_base_and_extent (t0, &off0, &sz0, &max0, &rev0);
+
+  bool rev1;
+  HOST_WIDE_INT off1, sz1, max1;
+  tree base1 = get_ref_base_and_extent (t1, &off1, &sz1, &max1, &rev1);
+
+  /* Types were compatible, so these are sanity checks.  */
+  gcc_assert (sz0 == sz1);
+  gcc_assert (max0 == max1);
+  gcc_assert (rev0 == rev1);
+
+  return (off0 == off1) && operand_equal_p (base0, base1, 0);
+}
+
 /* Compare two hashable_expr structures for equivalence.  They are
    considered equivalent when the expressions they denote must
    necessarily be equal.  The logic is intended to follow that of
@@ -246,9 +306,10 @@ hashable_expr_equal_p (const struct hashable_expr *expr0,
   switch (expr0->kind)
     {
     case EXPR_SINGLE:
-      return operand_equal_p (expr0->ops.single.rhs,
-                              expr1->ops.single.rhs, 0);
-
+      return equal_mem_array_ref_p (expr0->ops.single.rhs,
+				    expr1->ops.single.rhs)
+	     || operand_equal_p (expr0->ops.single.rhs,
+				 expr1->ops.single.rhs, 0);
     case EXPR_UNARY:
       if (expr0->ops.unary.op != expr1->ops.unary.op)
         return false;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 13:14 [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
                   ` (2 preceding siblings ...)
  2015-12-21 13:14 ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Alan Lawrence
@ 2015-12-21 13:22 ` Alan Lawrence
  2015-12-21 14:10   ` David Edelsohn
  2016-01-26 12:23   ` Dominik Vogt
  2015-12-22 15:54 ` [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
  4 siblings, 2 replies; 48+ messages in thread
From: Alan Lawrence @ 2015-12-21 13:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

...the test passes with --param sra-max-scalarization-size-Ospeed.

Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc, s390.

On alpha, tree-optimized is:

  MEM[(int[8] *)&a] = { 0, 1 };
  MEM[(int[8] *)&a + 8B] = { 2, 3 };
  MEM[(int[8] *)&a + 16B] = { 4, 5 };
  MEM[(int[8] *)&a + 24B] = { 6, 7 };
  _23 = a[0];
  _29 = a[1];
  sum_30 = _23 + _29;
  _36 = a[2];
  sum_37 = sum_30 + _36;

Which is beyond the scope of these changes to DOM to optimize.

On powerpc64, the test passes with -mcpu=power8 (the loop is vectorized as a
reduction); however, without that, similar code is generated to Alpha (the
vectorizer decides the reduction is not worthwhile without SIMD support), and
the test fails; hence, I've XFAILed for powerpc, but I think I could condition
the XFAIL on powerpc64 && !check_p8vector_hw_available, if preferred?

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove XFAIL for powerpc(32), hppa,
	aarch64, sparc, s390. Add --param sra-max-scalarization-size-Ospeed.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 9eccdc9..748448e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
 
 int
 foo ()
@@ -17,7 +17,8 @@ foo ()
 /* After late unrolling the above loop completely DOM should be
    able to optimize this to return 28.  */
 
-/* See PR63679 and PR64159, if the target forces the initializer to memory then
-   DOM is not able to perform this optimization.  */
+/* On alpha, the vectorizer generates writes of two vector elements at once,
+   but the loop reads only one element at a time, and DOM cannot resolve these.
+   The same happens on powerpc depending on the SIMD support available.  */
 
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail alpha*-*-* powerpc64*-*-* } } } */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 13:22 ` [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms Alan Lawrence
@ 2015-12-21 14:10   ` David Edelsohn
  2015-12-21 14:59     ` Bill Schmidt
  2016-01-26 12:23   ` Dominik Vogt
  1 sibling, 1 reply; 48+ messages in thread
From: David Edelsohn @ 2015-12-21 14:10 UTC (permalink / raw)
  To: Alan Lawrence, William J. Schmidt; +Cc: GCC Patches

On Mon, Dec 21, 2015 at 8:13 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> ...the test passes with --param sra-max-scalarization-size-Ospeed.
>
> Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc, s390.
>
> On alpha, tree-optimized is:
>
>   MEM[(int[8] *)&a] = { 0, 1 };
>   MEM[(int[8] *)&a + 8B] = { 2, 3 };
>   MEM[(int[8] *)&a + 16B] = { 4, 5 };
>   MEM[(int[8] *)&a + 24B] = { 6, 7 };
>   _23 = a[0];
>   _29 = a[1];
>   sum_30 = _23 + _29;
>   _36 = a[2];
>   sum_37 = sum_30 + _36;
>
> Which is beyond the scope of these changes to DOM to optimize.
>
> On powerpc64, the test passes with -mcpu=power8 (the loop is vectorized as a
> reduction); however, without that, similar code is generated to Alpha (the
> vectorizer decides the reduction is not worthwhile without SIMD support), and
> the test fails; hence, I've XFAILed for powerpc, but I think I could condition
> the XFAIL on powerpc64 && !check_p8vector_hw_available, if preferred?

Fun.

Does it work with -mcpu=power7?

Bill: What GCC DejaGNU incantation would you like to see?

- David

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 14:10   ` David Edelsohn
@ 2015-12-21 14:59     ` Bill Schmidt
  2015-12-21 15:22       ` Alan Lawrence
  2015-12-21 17:34       ` Alan Lawrence
  0 siblings, 2 replies; 48+ messages in thread
From: Bill Schmidt @ 2015-12-21 14:59 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Alan Lawrence, GCC Patches

On Mon, 2015-12-21 at 09:10 -0500, David Edelsohn wrote:
> On Mon, Dec 21, 2015 at 8:13 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> > ...the test passes with --param sra-max-scalarization-size-Ospeed.
> >
> > Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc, s390.
> >
> > On alpha, tree-optimized is:
> >
> >   MEM[(int[8] *)&a] = { 0, 1 };
> >   MEM[(int[8] *)&a + 8B] = { 2, 3 };
> >   MEM[(int[8] *)&a + 16B] = { 4, 5 };
> >   MEM[(int[8] *)&a + 24B] = { 6, 7 };
> >   _23 = a[0];
> >   _29 = a[1];
> >   sum_30 = _23 + _29;
> >   _36 = a[2];
> >   sum_37 = sum_30 + _36;
> >
> > Which is beyond the scope of these changes to DOM to optimize.
> >
> > On powerpc64, the test passes with -mcpu=power8 (the loop is vectorized as a
> > reduction); however, without that, similar code is generated to Alpha (the
> > vectorizer decides the reduction is not worthwhile without SIMD support), and
> > the test fails; hence, I've XFAILed for powerpc, but I think I could condition
> > the XFAIL on powerpc64 && !check_p8vector_hw_available, if preferred?
> 
> Fun.
> 
> Does it work with -mcpu=power7?
> 
> Bill: What GCC DejaGNU incantation would you like to see?

This sounds like more fallout from unaligned accesses being faster on
POWER8 than previous hardware.  What about conditioning the XFAIL on

{ powerpc*-*-* && { ! vect_hw_misalign } }

-- does this work properly?  Right now that's just an alternative way of
saying what you suggested, but I think will better document the reason
for the XFAIL.

Thanks,
Bill

> 
> - David
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 14:59     ` Bill Schmidt
@ 2015-12-21 15:22       ` Alan Lawrence
  2015-12-21 15:33         ` Bill Schmidt
  2015-12-21 17:34       ` Alan Lawrence
  1 sibling, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2015-12-21 15:22 UTC (permalink / raw)
  To: Bill Schmidt, David Edelsohn; +Cc: GCC Patches

On 21/12/15 14:59, Bill Schmidt wrote:
>>>
>>> On powerpc64, the test passes with -mcpu=power8 (the loop is vectorized as a
>>> reduction); however, without that, similar code is generated to Alpha (the
>>> vectorizer decides the reduction is not worthwhile without SIMD support), and
>>> the test fails; hence, I've XFAILed for powerpc, but I think I could condition
>>> the XFAIL on powerpc64 && !check_p8vector_hw_available, if preferred?
>>
>> Fun.
>>
>> Does it work with -mcpu=power7?

Yes, it works with -mcpu=power7, as well as -mcpu=power8, but not e.g. -mcpu=power6.

>> Bill: What GCC DejaGNU incantation would you like to see?
>
> This sounds like more fallout from unaligned accesses being faster on
> POWER8 than previous hardware.  What about conditioning the XFAIL on
>
> { powerpc*-*-* && { ! vect_hw_misalign } }
>
> -- does this work properly?

Not on a stage1 compiler - check_p8vector_hw_available itself requires being 
able to run executables - I'll check on gcc112. However, both look like they're 
really about the host (ability to execute an asm instruction), not the target 
(/ability for gcc to output such an instruction)....

--Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 15:22       ` Alan Lawrence
@ 2015-12-21 15:33         ` Bill Schmidt
  2015-12-22 16:00           ` Alan Lawrence
  2016-01-15 10:46           ` Alan Lawrence
  0 siblings, 2 replies; 48+ messages in thread
From: Bill Schmidt @ 2015-12-21 15:33 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: David Edelsohn, GCC Patches

On Mon, 2015-12-21 at 15:22 +0000, Alan Lawrence wrote:
> On 21/12/15 14:59, Bill Schmidt wrote:
> >>>
> >>> On powerpc64, the test passes with -mcpu=power8 (the loop is vectorized as a
> >>> reduction); however, without that, similar code is generated to Alpha (the
> >>> vectorizer decides the reduction is not worthwhile without SIMD support), and
> >>> the test fails; hence, I've XFAILed for powerpc, but I think I could condition
> >>> the XFAIL on powerpc64 && !check_p8vector_hw_available, if preferred?
> >>
> >> Fun.
> >>
> >> Does it work with -mcpu=power7?
> 
> Yes, it works with -mcpu=power7, as well as -mcpu=power8, but not e.g. -mcpu=power6.
> 
> >> Bill: What GCC DejaGNU incantation would you like to see?
> >
> > This sounds like more fallout from unaligned accesses being faster on
> > POWER8 than previous hardware.  What about conditioning the XFAIL on
> >
> > { powerpc*-*-* && { ! vect_hw_misalign } }
> >
> > -- does this work properly?
> 
> Not on a stage1 compiler - check_p8vector_hw_available itself requires being 
> able to run executables - I'll check on gcc112. However, both look like they're 
> really about the host (ability to execute an asm instruction), not the target 
> (/ability for gcc to output such an instruction)....

Hm, that looks like a pervasive problem for powerpc.  There are a number
of things that are supposed to be testing effective target but rely on
check_p8vector_hw_available, which as you note requires executing an
instruction and is really about the host.  We need to clean that up; I
should probably open a bug.  Kind of amazed this has gotten past us for
a couple of years.

For now, just XFAILing for powerpc seems the best alternative for this
test.

Thanks,
Bill

> 
> --Alan
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 14:59     ` Bill Schmidt
  2015-12-21 15:22       ` Alan Lawrence
@ 2015-12-21 17:34       ` Alan Lawrence
  1 sibling, 0 replies; 48+ messages in thread
From: Alan Lawrence @ 2015-12-21 17:34 UTC (permalink / raw)
  To: Bill Schmidt, David Edelsohn; +Cc: GCC Patches

On 21/12/15 14:59, Bill Schmidt wrote:

>>> On powerpc64, the test passes with -mcpu=power8 (the loop is vectorized as a
>>> reduction); however, without that, similar code is generated to Alpha (the
>>> vectorizer decides the reduction is not worthwhile without SIMD support), and
>>> the test fails; hence, I've XFAILed for powerpc, but I think I could condition
>>> the XFAIL on powerpc64 && !check_p8vector_hw_available, if preferred?
>>
>> Fun.
>>
>> Does it work with -mcpu=power7?
>>
>> Bill: What GCC DejaGNU incantation would you like to see?
>
> This sounds like more fallout from unaligned accesses being faster on
> POWER8 than previous hardware.  What about conditioning the XFAIL on
>
> { powerpc*-*-* && { ! vect_hw_misalign } }
>
> -- does this work properly?  Right now that's just an alternative way of
> saying what you suggested

It works properly in that it is equivalent to what I suggested. However, the 
test passes on power7 as well, so my suggestion of check_p8vector_hw_available 
is itself wrong...

--Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified
  2015-12-21 13:14 [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
                   ` (3 preceding siblings ...)
  2015-12-21 13:22 ` [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms Alan Lawrence
@ 2015-12-22 15:54 ` Alan Lawrence
  2015-12-22 16:58   ` Bill Schmidt
  4 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2015-12-22 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn, Bill Schmidt

On 21/12/15 13:13, Alan Lawrence wrote:
> This is a respin of previous patch series:
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03271.html
> Minus three of the smaller patches having already been committed; with the
> updated version of the main patch to SRA; and the patches to DOM reworked
> to avoid constructing gimple stmt's.
>
> IMHO this feels quite invasive for stage 3, however, I note the PR/63679
> (ssa-dom-cse-2.c being XFAILed on a bunch of platforms, and currently still
> FAILing on ARM) is a regression from gcc 4.9.1. So I'd ask maintainer's thoughts
> as to whether we should take this patch set for gcc 6.
>
>

I've now tested the series on powerpc64-none-linux-gnu (gcc110) and 
powerpc64le-none-linux-gnu (gcc112).

On powerpc64le, this causes the same guality failures as on AArch64:
gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
and the same with other optimization flags. (I note that there are quite a lot 
of guality failures on both powerpc and also aarch64, which are generally not 
included in the posts on the gcc-testresults mailing list).

The same pr54970 tests still seem to pass on powerpc64 big-endian even with the 
patches.

On both powerpc64 and powerpc64le, the ssa-dom-cse-7.c test also fails, because 
the constant gets pushed to the constant pool :(, which was not the point of 
that test (I'd tried to construct it to test normalization of MEM_REFs in DOM 
prior to the SRA changes). So, adding --param 
sra-max-scalarization-size-Ospeed=32 fixes this once the SRA patch is in (!), or 
we could xfail on powerpc64*-*-*-*.

--Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 15:33         ` Bill Schmidt
@ 2015-12-22 16:00           ` Alan Lawrence
  2015-12-22 17:02             ` Bill Schmidt
  2015-12-24 20:01             ` Mike Stump
  2016-01-15 10:46           ` Alan Lawrence
  1 sibling, 2 replies; 48+ messages in thread
From: Alan Lawrence @ 2015-12-22 16:00 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: David Edelsohn, GCC Patches

On 21/12/15 15:33, Bill Schmidt wrote:
>>
>> Not on a stage1 compiler - check_p8vector_hw_available itself requires being
>> able to run executables - I'll check on gcc112. However, both look like they're
>> really about the host (ability to execute an asm instruction), not the target
>> (/ability for gcc to output such an instruction)....
>
> Hm, that looks like a pervasive problem for powerpc.  There are a number
> of things that are supposed to be testing effective target but rely on
> check_p8vector_hw_available, which as you note requires executing an
> instruction and is really about the host.  We need to clean that up; I
> should probably open a bug.  Kind of amazed this has gotten past us for
> a couple of years.

Well, I was about to apologize for making a bogus remark. A really "proper" 
setup, would be to tell dejagnu to run your execution tests in some kind of 
emulator/simulator (on your host, perhaps one kind of powerpc) that 
only/additionally runs instructions for the other, _target_, kind of 
powerpc...and whatever setup you'd need for all that probably does not live in 
the GCC repository!

> For now, just XFAILing for powerpc seems the best alternative for this
> test.

Ok, thanks.


--Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified
  2015-12-22 15:54 ` [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
@ 2015-12-22 16:58   ` Bill Schmidt
  2015-12-22 17:36     ` Alan Lawrence
  0 siblings, 1 reply; 48+ messages in thread
From: Bill Schmidt @ 2015-12-22 16:58 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, David Edelsohn

On Tue, 2015-12-22 at 15:54 +0000, Alan Lawrence wrote:
> On 21/12/15 13:13, Alan Lawrence wrote:
> > This is a respin of previous patch series:
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03271.html
> > Minus three of the smaller patches having already been committed; with the
> > updated version of the main patch to SRA; and the patches to DOM reworked
> > to avoid constructing gimple stmt's.
> >
> > IMHO this feels quite invasive for stage 3, however, I note the PR/63679
> > (ssa-dom-cse-2.c being XFAILed on a bunch of platforms, and currently still
> > FAILing on ARM) is a regression from gcc 4.9.1. So I'd ask maintainer's thoughts
> > as to whether we should take this patch set for gcc 6.
> >
> >
> 
> I've now tested the series on powerpc64-none-linux-gnu (gcc110) and 
> powerpc64le-none-linux-gnu (gcc112).
> 
> On powerpc64le, this causes the same guality failures as on AArch64:
> gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
> and the same with other optimization flags. (I note that there are quite a lot 
> of guality failures on both powerpc and also aarch64, which are generally not 
> included in the posts on the gcc-testresults mailing list).

That's interesting.  I never see these for powerpc64le on my internal
build system.  I wonder what the difference is?

> 
> The same pr54970 tests still seem to pass on powerpc64 big-endian even with the 
> patches.
> 
> On both powerpc64 and powerpc64le, the ssa-dom-cse-7.c test also fails, because 
> the constant gets pushed to the constant pool :(, which was not the point of 
> that test (I'd tried to construct it to test normalization of MEM_REFs in DOM 
> prior to the SRA changes). So, adding --param 
> sra-max-scalarization-size-Ospeed=32 fixes this once the SRA patch is in (!), or 
> we could xfail on powerpc64*-*-*-*.

If this is a democracy, I'd vote for adding the --param to keep the test
active, but I won't lie down on the tracks over it. ;)

Thanks for including the powerpc64 testing!

Bill

> 
> --Alan
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-22 16:00           ` Alan Lawrence
@ 2015-12-22 17:02             ` Bill Schmidt
  2015-12-24 20:01             ` Mike Stump
  1 sibling, 0 replies; 48+ messages in thread
From: Bill Schmidt @ 2015-12-22 17:02 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: David Edelsohn, GCC Patches

On Tue, 2015-12-22 at 16:00 +0000, Alan Lawrence wrote:
> On 21/12/15 15:33, Bill Schmidt wrote:
> >>
> >> Not on a stage1 compiler - check_p8vector_hw_available itself requires being
> >> able to run executables - I'll check on gcc112. However, both look like they're
> >> really about the host (ability to execute an asm instruction), not the target
> >> (/ability for gcc to output such an instruction)....
> >
> > Hm, that looks like a pervasive problem for powerpc.  There are a number
> > of things that are supposed to be testing effective target but rely on
> > check_p8vector_hw_available, which as you note requires executing an
> > instruction and is really about the host.  We need to clean that up; I
> > should probably open a bug.  Kind of amazed this has gotten past us for
> > a couple of years.
> 
> Well, I was about to apologize for making a bogus remark. A really "proper" 
> setup, would be to tell dejagnu to run your execution tests in some kind of 
> emulator/simulator (on your host, perhaps one kind of powerpc) that 
> only/additionally runs instructions for the other, _target_, kind of 
> powerpc...and whatever setup you'd need for all that probably does not live in 
> the GCC repository!

Yeah -- after I wrote that, I looked around some more, and it seems that
this is relatively common practice.  I agree that it would be pretty
tough to set this up properly...

> 
> > For now, just XFAILing for powerpc seems the best alternative for this
> > test.
> 
> Ok, thanks.
> 
> 
> --Alan
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified
  2015-12-22 16:58   ` Bill Schmidt
@ 2015-12-22 17:36     ` Alan Lawrence
  2015-12-22 23:02       ` Bill Schmidt
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2015-12-22 17:36 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, David Edelsohn

On 22/12/15 16:05, Bill Schmidt wrote:
> On Tue, 2015-12-22 at 15:54 +0000, Alan Lawrence wrote:
>> On 21/12/15 13:13, Alan Lawrence wrote:
>>> This is a respin of previous patch series:
>>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03271.html
>>> Minus three of the smaller patches having already been committed; with the
>>> updated version of the main patch to SRA; and the patches to DOM reworked
>>> to avoid constructing gimple stmt's.
>>>
>>> IMHO this feels quite invasive for stage 3, however, I note the PR/63679
>>> (ssa-dom-cse-2.c being XFAILed on a bunch of platforms, and currently still
>>> FAILing on ARM) is a regression from gcc 4.9.1. So I'd ask maintainer's thoughts
>>> as to whether we should take this patch set for gcc 6.
>>>
>>>
>>
>> I've now tested the series on powerpc64-none-linux-gnu (gcc110) and
>> powerpc64le-none-linux-gnu (gcc112).
>>
>> On powerpc64le, this causes the same guality failures as on AArch64:
>> gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
>> gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
>> gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
>> and the same with other optimization flags. (I note that there are quite a lot
>> of guality failures on both powerpc and also aarch64, which are generally not
>> included in the posts on the gcc-testresults mailing list).
>
> That's interesting.  I never see these for powerpc64le on my internal
> build system.  I wonder what the difference is?

I've ssh'd into gcc112.fsffrance.org, a power8 powerpc64le system;
and configure --with-cpu=power8 --disable-libsanitizer --with-long-double-128 
--enable-languages=c,c++,lto.

Hmmm...ISTR one variable that can make a big difference is the version (or 
absence!) of gdb...gcc112 has gdb "Fedora 7.8.2-38.fc21", copyright 2014, and 
GDB 7.8 looks to have been released at the end of 2014.

>> The same pr54970 tests still seem to pass on powerpc64 big-endian even with the
>> patches.

Ach, not quite. In fact those three are failing on powerpc64 bigendian even 
*without* the patches (at all optimization levels besides -O0 where they pass). 
This is on gcc110, configure --enable-languages=c,c++,lto --disable-libsanitizer 
--with-cpu=power7 --with-long-double-128. GDB is Fedora 7.7.1-21.fc20, a bit 
older, and I tested both unix/-m32 and unix/-m64 (following Bill Seurer's posts 
on gcc-testresults).

I'll email a list of the failures I'm seeing offlist (summary: 186 on gcc112,
148 on gcc110 with -m32, 395 on gcc112 with -m64); however, I suspect gdb 
version is the difference we are looking for.

Which *may* mean that with a more up-to-date GDB, it's possible those failures 
may not be introduced on ppc64le. (Or similarly AArch64.) Hmmm....

--Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified
  2015-12-22 17:36     ` Alan Lawrence
@ 2015-12-22 23:02       ` Bill Schmidt
  0 siblings, 0 replies; 48+ messages in thread
From: Bill Schmidt @ 2015-12-22 23:02 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches, David Edelsohn

On Tue, 2015-12-22 at 17:36 +0000, Alan Lawrence wrote:
> On 22/12/15 16:05, Bill Schmidt wrote:
> > On Tue, 2015-12-22 at 15:54 +0000, Alan Lawrence wrote:
> >> On 21/12/15 13:13, Alan Lawrence wrote:
> >>> This is a respin of previous patch series:
> >>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03271.html
> >>> Minus three of the smaller patches having already been committed; with the
> >>> updated version of the main patch to SRA; and the patches to DOM reworked
> >>> to avoid constructing gimple stmt's.
> >>>
> >>> IMHO this feels quite invasive for stage 3, however, I note the PR/63679
> >>> (ssa-dom-cse-2.c being XFAILed on a bunch of platforms, and currently still
> >>> FAILing on ARM) is a regression from gcc 4.9.1. So I'd ask maintainer's thoughts
> >>> as to whether we should take this patch set for gcc 6.
> >>>
> >>>
> >>
> >> I've now tested the series on powerpc64-none-linux-gnu (gcc110) and
> >> powerpc64le-none-linux-gnu (gcc112).
> >>
> >> On powerpc64le, this causes the same guality failures as on AArch64:
> >> gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
> >> gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
> >> gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
> >> and the same with other optimization flags. (I note that there are quite a lot
> >> of guality failures on both powerpc and also aarch64, which are generally not
> >> included in the posts on the gcc-testresults mailing list).
> >
> > That's interesting.  I never see these for powerpc64le on my internal
> > build system.  I wonder what the difference is?
> 
> I've ssh'd into gcc112.fsffrance.org, a power8 powerpc64le system;
> and configure --with-cpu=power8 --disable-libsanitizer --with-long-double-128 
> --enable-languages=c,c++,lto.
> 
> Hmmm...ISTR one variable that can make a big difference is the version (or 
> absence!) of gdb...gcc112 has gdb "Fedora 7.8.2-38.fc21", copyright 2014, and 
> GDB 7.8 looks to have been released at the end of 2014.

I no longer use --disable-libsanitizer, but that doesn't seem relevant.
I also use --disable-multilib, though I think thats the default for
powerpc64le now; not sure.

The GDB on my internal system is actually even older (7.7.1, from
earlier in 2014), so I don't think that's necessarily to blame.  Still
seems mysterious.

That said, for better or worse we've had a history of ignoring the
guality failures for big-endian because we've seen so many of them, and
they don't seem to reflect a real problem with the debugger.
Understanding why they break hasn't managed to become a priority yet.  I
personally won't be upset if we have a few more than we did before, as
it probably doesn't indicate anything relevant about your patch.

Bill

> 
> >> The same pr54970 tests still seem to pass on powerpc64 big-endian even with the
> >> patches.
> 
> Ach, not quite. In fact those three are failing on powerpc64 bigendian even 
> *without* the patches (at all optimization levels besides -O0 where they pass). 
> This is on gcc110, configure --enable-languages=c,c++,lto --disable-libsanitizer 
> --with-cpu=power7 --with-long-double-128. GDB is Fedora 7.7.1-21.fc20, a bit 
> older, and I tested both unix/-m32 and unix/-m64 (following Bill Seurer's posts 
> on gcc-testresults).
> 
> I'll email a list of the failures I'm seeing offlist (summary: 186 on gcc112,
> 148 on gcc110 with -m32, 395 on gcc112 with -m64); however, I suspect gdb 
> version is the difference we are looking for.
> 
> Which *may* mean that with a more up-to-date GDB, it's possible those failures 
> may not be introduced on ppc64le. (Or similarly AArch64.) Hmmm....
> 
> --Alan
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/4] Make SRA scalarize constant-pool loads
  2015-12-21 13:14 ` [PATCH 1/4] Make SRA scalarize constant-pool loads Alan Lawrence
@ 2015-12-24 11:53   ` Alan Lawrence
  2016-01-04 12:13     ` Alan Lawrence
  2016-01-15 10:27     ` Alan Lawrence
  0 siblings, 2 replies; 48+ messages in thread
From: Alan Lawrence @ 2015-12-24 11:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt

Here's a new version that fixes the gcc.dg/guality/pr54970.c failures seen on
aarch64 and powerpc64. Prior to SRA handling constant pool decls,
-fdump-tree-esra-details (at -O1 -g) had shown:
  <bb 2>:
  a = *.LC0;
  # DEBUG a$0 => MEM[(int[3] *)&*.LC0]
  a$4_3 = MEM[(int[3] *)&*.LC0 + 4B];
  # DEBUG a$4 => a$4_3
  a$8_4 = MEM[(int[3] *)&*.LC0 + 8B];

The previous patch changed this to:
  <bb 2>:
  SR.5_3 = *.LC0[0];
  SR.6_4 = *.LC0[1];
  SR.7_19 = *.LC0[2];
  SR.8_21 = *.LC1[0];
  SR.9_22 = *.LC1[1];
  SR.10_23 = *.LC1[2];
  # DEBUG a$0 => NULL   // Note here
  a$4_24 = SR.6_4;
  # DEBUG a$4 => a$4_24
  a$8_25 = SR.7_19;

Turns out the DEBUG a$0 => NULL was produced in load_assign_lhs_subreplacements:

	  if (lacc && lacc->grp_to_be_debug_replaced)
	    {
	      gdebug *ds;
	      tree drhs;
	      struct access *racc = find_access_in_subtree (sad->top_racc,
							    offset,
							    lacc->size);

	      if (racc && racc->grp_to_be_replaced)
		{
		  if (racc->grp_write)
		    drhs = get_access_replacement (racc);
		  else
		    drhs = NULL;  // <=== HERE
		}
...
	      ds = gimple_build_debug_bind (get_access_replacement (lacc),
					    drhs, gsi_stmt (sad->old_gsi));

Prior to the patch, we'd skipped around load_assign_lhs_subreplacements, because
access_has_children_p(racc) (for racc = *.LC0) didn't hold in sra_modify_assign.

I also added a constant_decl_p function, combining the two checks, plus some
testcase fixes.

This also fixes a bunch of other guality tests on AArch64 that were failing
prior to the patch series, and another bunch on PowerPC64 (bigendian -m32), listed below.

Bootstrapped + check-gcc,g++ on x86_64, ARM, AArch64,
  also on powerpc64{,le}-none-linux-gnu *in combination with the other patches
  in the series* (I haven't tested the individual patches on PPC),
  plus Ada on ARM and x86_64.

gcc/ChangeLog:

	PR target/63679
	* tree-sra.c (disqualified_constants, constant_decl_p): New.
	(sra_initialize): Allocate disqualified_constants.
	(sra_deinitialize): Free disqualified_constants.
	(disqualify_candidate): Update disqualified_constants when appropriate.
	(create_access): Scan for constant-pool entries as we go along.
	(scalarizable_type_p): Add check against type_contains_placeholder_p.
	(maybe_add_sra_candidate): Allow constant-pool entries.
	(load_assign_lhs_subreplacements): Bind debug for constant pool vars.
	(initialize_constant_pool_replacements): New.
	(sra_modify_assign): Avoid mangling assignments created by previous,
	and don't generate writes into constant pool.
	(sra_modify_function_body): Call initialize_constant_pool_replacements.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/sra-17.c: New.
	* gcc.dg/tree-ssa/sra-18.c: New.

Tests fixed on aarch64-none-linux-gnu:
gcc.dg/guality/pr54970.c   -O1  line 15 *p == 3
gcc.dg/guality/pr54970.c   -O1  line 15 *q == 2
gcc.dg/guality/pr54970.c   -O1  line 20 *p == 13
gcc.dg/guality/pr54970.c   -O1  line 20 *q == 2
gcc.dg/guality/pr54970.c   -O1  line 25 *p == 13
gcc.dg/guality/pr54970.c   -O1  line 25 *q == 12
gcc.dg/guality/pr54970.c   -O1  line 31 *p == 6
gcc.dg/guality/pr54970.c   -O1  line 31 *q == 5
gcc.dg/guality/pr54970.c   -O1  line 36 *p == 26
gcc.dg/guality/pr54970.c   -O1  line 36 *q == 5
gcc.dg/guality/pr54970.c   -O1  line 45 *p == 26
gcc.dg/guality/pr54970.c   -O1  line 45 *q == 25
gcc.dg/guality/pr54970.c   -O1  line 45 p[-1] == 25
gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
gcc.dg/guality/pr54970.c   -O1  line 45 q[1] == 26
gcc.dg/guality/pr56154-1.c   -O1  line pr56154-1.c:20 x.a == 6
gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s1.f == 5.0
gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s1.g == 6.0
gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s2.f == 0.0
gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s2.g == 6.0
gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s1.f == 5.0
gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s1.g == 6.0
gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s2.f == 5.0
gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s2.g == 6.0
gcc.dg/guality/sra-1.c   -O1  line 21 a.j == 14
gcc.dg/guality/sra-1.c   -O1  line 32 a[1] == 14
gcc.dg/guality/sra-1.c   -O1  line 43 a.i == 4
gcc.dg/guality/sra-1.c   -O1  line 43 a.j == 14
(and other optimization levels)
On ppc64 bigendian, with the rest of the series (but I expect it's this patch):
unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
(again, and other optimization levels).

---
 gcc/testsuite/gcc.dg/tree-ssa/sra-17.c |  19 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/sra-18.c |  28 +++++++++
 gcc/tree-sra.c                         | 104 +++++++++++++++++++++++++++++++--
 3 files changed, 147 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-18.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
new file mode 100644
index 0000000..25667f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */
+/* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */
+
+extern void abort (void);
+
+int
+main (int argc, char **argv)
+{
+  long a[4] = { 7, 19, 11, 255 };
+  int tot = 0;
+  for (int i = 0; i < 4; i++)
+    tot = (tot*256) + a[i];
+  if (tot == 0x07130bff)
+    return 0;
+  abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\\[" 4 "esra" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
new file mode 100644
index 0000000..609fb11
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */
+/* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */
+
+extern void abort (void);
+struct foo { long x; };
+
+struct bar { struct foo f[2]; };
+
+struct baz { struct bar b[2]; };
+
+int
+main (int argc, char **argv)
+{
+  struct baz a = { { { { { 4 }, { 5 } } }, { { { 6 }, { 7 } } }  } };
+  int tot = 0;
+  for (int i = 0; i < 2; i++)
+    for (int j = 0; j < 2; j++)
+      tot = (tot*256) + a.b[i].f[j].x;
+  if (tot == 0x04050607)
+    return 0;
+  abort ();
+}
+
+/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[0\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[0\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[1\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
+/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[1\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index c4fea5b..77a2e49 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -328,6 +328,10 @@ candidate (unsigned uid)
    those which cannot be (because they are and need be used as a whole).  */
 static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
 
+/* Bitmap of candidates in the constant pool, which cannot be scalarized
+   because this would produce non-constant expressions (e.g. Ada).  */
+static bitmap disqualified_constants;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -652,6 +656,7 @@ sra_initialize (void)
     (vec_safe_length (cfun->local_decls) / 2);
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
+  disqualified_constants = BITMAP_ALLOC (NULL);
   gcc_obstack_init (&name_obstack);
   base_access_vec = new hash_map<tree, auto_vec<access_p> >;
   memset (&sra_stats, 0, sizeof (sra_stats));
@@ -670,6 +675,7 @@ sra_deinitialize (void)
   candidates = NULL;
   BITMAP_FREE (should_scalarize_away_bitmap);
   BITMAP_FREE (cannot_scalarize_away_bitmap);
+  BITMAP_FREE (disqualified_constants);
   access_pool.release ();
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);
@@ -677,6 +683,13 @@ sra_deinitialize (void)
   delete base_access_vec;
 }
 
+/* Return true if DECL is a VAR_DECL in the constant pool, false otherwise.  */
+
+static bool constant_decl_p (tree decl)
+{
+  return TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl);
+}
+
 /* Remove DECL from candidates for SRA and write REASON to the dump file if
    there is one.  */
 static void
@@ -684,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
 {
   if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
     candidates->remove_elt_with_hash (decl, DECL_UID (decl));
+  if (constant_decl_p (decl))
+    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -835,8 +850,11 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
   return access;
 }
 
+static bool maybe_add_sra_candidate (tree);
+
 /* Create and insert access for EXPR. Return created access, or NULL if it is
-   not possible.  */
+   not possible.  Also scan for uses of constant pool as we go along and add
+   to candidates.  */
 
 static struct access *
 create_access (tree expr, gimple *stmt, bool write)
@@ -859,6 +877,25 @@ create_access (tree expr, gimple *stmt, bool write)
   else
     ptr = false;
 
+  /* For constant-pool entries, check we can substitute the constant value.  */
+  if (constant_decl_p (base)
+      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
+    {
+      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
+      if (expr != base
+	  && !is_gimple_reg_type (TREE_TYPE (expr))
+	  && dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  /* This occurs in Ada with accesses to ARRAY_RANGE_REFs,
+	     and elements of multidimensional arrays (which are
+	     multi-element arrays in their own right).  */
+	  fprintf (dump_file, "Allowing non-reg-type load of part"
+			      " of constant-pool entry: ");
+	  print_generic_expr (dump_file, expr, 0);
+	}
+      maybe_add_sra_candidate (base);
+    }
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
     return NULL;
 
@@ -918,6 +955,8 @@ static bool
 scalarizable_type_p (tree type)
 {
   gcc_assert (!is_gimple_reg_type (type));
+  if (type_contains_placeholder_p (type))
+    return false;
 
   switch (TREE_CODE (type))
   {
@@ -1852,7 +1891,10 @@ maybe_add_sra_candidate (tree var)
       reject (var, "not aggregate");
       return false;
     }
-  if (needs_to_live_in_memory (var))
+  /* Allow constant-pool entries (that "need to live in memory")
+     unless we are doing IPA SRA.  */
+  if (needs_to_live_in_memory (var)
+      && (sra_mode == SRA_MODE_EARLY_IPA || !constant_decl_p (var)))
     {
       reject (var, "needs to live in memory");
       return false;
@@ -3113,7 +3155,7 @@ load_assign_lhs_subreplacements (struct access *lacc,
 
 	      if (racc && racc->grp_to_be_replaced)
 		{
-		  if (racc->grp_write)
+		  if (racc->grp_write || constant_decl_p (racc->base))
 		    drhs = get_access_replacement (racc);
 		  else
 		    drhs = NULL;
@@ -3272,6 +3314,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
   racc = get_access_for_expr (rhs);
   if (!lacc && !racc)
     return SRA_AM_NONE;
+  /* Avoid modifying initializations of constant-pool replacements.  */
+  if (racc && (racc->replacement_decl == lhs))
+    return SRA_AM_NONE;
 
   loc = gimple_location (stmt);
   if (lacc && lacc->grp_to_be_replaced)
@@ -3388,7 +3433,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
       || contains_vce_or_bfcref_p (lhs)
       || stmt_ends_bb_p (stmt))
     {
-      if (access_has_children_p (racc))
+      /* No need to copy into a constant-pool, it comes pre-initialized.  */
+      if (access_has_children_p (racc) && !constant_decl_p (racc->base))
 	generate_subtree_copies (racc->first_child, rhs, racc->offset, 0, 0,
 				 gsi, false, false, loc);
       if (access_has_children_p (lacc))
@@ -3491,6 +3537,54 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
     }
 }
 
+/* Set any scalar replacements of values in the constant pool to the initial
+   value of the constant.  (Constant-pool decls like *.LC0 have effectively
+   been initialized before the program starts, we must do the same for their
+   replacements.)  Thus, we output statements like 'SR.1 = *.LC0[0];' into
+   the function's entry block.  */
+
+static void
+initialize_constant_pool_replacements (void)
+{
+  gimple_seq seq = NULL;
+  gimple_stmt_iterator gsi = gsi_start (seq);
+  bitmap_iterator bi;
+  unsigned i;
+
+  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
+    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
+	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
+      {
+	tree var = candidate (i);
+	if (!constant_decl_p (var))
+	  continue;
+	vec<access_p> *access_vec = get_base_access_vector (var);
+	if (!access_vec)
+	  continue;
+	for (unsigned i = 0; i < access_vec->length (); i++)
+	  {
+	    struct access *access = (*access_vec)[i];
+	    if (!access->replacement_decl)
+	      continue;
+	    gassign *stmt = gimple_build_assign (
+	      get_access_replacement (access), unshare_expr (access->expr));
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Generating constant initializer: ");
+		print_gimple_stmt (dump_file, stmt, 0, 1);
+		fprintf (dump_file, "\n");
+	      }
+	    gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	    update_stmt (stmt);
+	  }
+      }
+
+  seq = gsi_seq (gsi);
+  if (seq)
+    gsi_insert_seq_on_edge_immediate (
+      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
+}
+
 /* Traverse the function body and all modifications as decided in
    analyze_all_variable_accesses.  Return true iff the CFG has been
    changed.  */
@@ -3501,6 +3595,8 @@ sra_modify_function_body (void)
   bool cfg_changed = false;
   basic_block bb;
 
+  initialize_constant_pool_replacements ();
+
   FOR_EACH_BB_FN (bb, cfun)
     {
       gimple_stmt_iterator gsi = gsi_start_bb (bb);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2015-12-21 13:14 ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Alan Lawrence
@ 2015-12-24 11:55   ` Alan Lawrence
  2016-01-04 19:05     ` Jeff Law
  2016-01-19  3:05     ` H.J. Lu
  2016-01-04 19:08   ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Jeff Law
  1 sibling, 2 replies; 48+ messages in thread
From: Alan Lawrence @ 2015-12-24 11:55 UTC (permalink / raw)
  To: gcc-patches

This version changes the test cases to fix failures on some platforms, by
rewriting the initializers so that they aren't pushed out to the constant pool.

gcc/ChangeLog:

	* tree-ssa-scopedtables.c (avail_expr_hash): Hash MEM_REF and ARRAY_REF
	using get_ref_base_and_extent.
	(equal_mem_array_ref_p): New.
	(hashable_expr_equal_p): Add call to previous.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-cse-5.c: New.
	* gcc.dg/tree-ssa/ssa-dom-cse-6.c: New.
	* gcc.dg/tree-ssa/ssa-dom-cse-7.c: New.

---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c | 18 +++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c | 20 ++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c | 21 +++++++++
 gcc/tree-ssa-scopedtables.c                   | 67 +++++++++++++++++++++++++--
 4 files changed, 123 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
new file mode 100644
index 0000000..cd38d3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
@@ -0,0 +1,18 @@
+/* Test normalization of ARRAY_REF expressions to MEM_REFs in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-fre -fdump-tree-dom2" } */
+
+#define N 8
+
+int
+main (int argc, char **argv)
+{
+  int a[N];
+  for (int i = 0; i < N; i++)
+    a[i] = 2*i + 1;
+  int *p = &a[0];
+  __builtin_printf ("%d\n", a[argc]);
+  return *(++p);
+}
+
+/* { dg-final { scan-tree-dump-times "return 3;" 1 "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
new file mode 100644
index 0000000..002fd81
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
@@ -0,0 +1,20 @@
+/* Test normalization of ARRAY_REF expressions to MEM_REFs in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-fre -fdump-tree-dom2" } */
+
+int
+main (int argc, char **argv)
+{
+  union {
+    int a[4];
+    int b[2];
+  } u;
+  u.a[0] = 1;
+  u.a[1] = 42;
+  u.a[2] = 3;
+  u.a[3] = 4;
+  __builtin_printf ("%d\n", u.a[argc]);
+  return u.b[1];
+}
+
+/* { dg-final { scan-tree-dump-times "return 42;" 1 "dom2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c
new file mode 100644
index 0000000..151e5d4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c
@@ -0,0 +1,21 @@
+/* Test normalization of MEM_REF expressions in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
+
+typedef struct {
+  int a[8];
+} foo;
+
+foo f;
+
+int
+test ()
+{
+  foo g;
+  g.a[0] = 1; g.a[1] = 2; g.a[2] = 3; g.a[3] = 4;
+  g.a[4] = 5; g.a[5] = 6; g.a[6] = 7; g.a[7] = 8;
+  f=g;
+  return f.a[2];
+}
+
+/* { dg-final { scan-tree-dump-times "return 3;" 1 "optimized" } } */
diff --git a/gcc/tree-ssa-scopedtables.c b/gcc/tree-ssa-scopedtables.c
index 6034f79..8df7125 100644
--- a/gcc/tree-ssa-scopedtables.c
+++ b/gcc/tree-ssa-scopedtables.c
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "tree-eh.h"
 #include "internal-fn.h"
+#include "tree-dfa.h"
 
 static bool hashable_expr_equal_p (const struct hashable_expr *,
 				   const struct hashable_expr *);
@@ -209,11 +210,70 @@ avail_expr_hash (class expr_hash_elt *p)
   const struct hashable_expr *expr = p->expr ();
   inchash::hash hstate;
 
+  if (expr->kind == EXPR_SINGLE)
+    {
+      /* T could potentially be a switch index or a goto dest.  */
+      tree t = expr->ops.single.rhs;
+      if (TREE_CODE (t) == MEM_REF || TREE_CODE (t) == ARRAY_REF)
+	{
+	  /* Make equivalent statements of both these kinds hash together.
+	     Dealing with both MEM_REF and ARRAY_REF allows us not to care
+	     about equivalence with other statements not considered here.  */
+	  bool reverse;
+	  HOST_WIDE_INT offset, size, max_size;
+	  tree base = get_ref_base_and_extent (t, &offset, &size, &max_size,
+					       &reverse);
+	  /* Strictly, we could try to normalize variable-sized accesses too,
+	    but here we just deal with the common case.  */
+	  if (size == max_size)
+	    {
+	      enum tree_code code = MEM_REF;
+	      hstate.add_object (code);
+	      inchash::add_expr (base, hstate);
+	      hstate.add_object (offset);
+	      hstate.add_object (size);
+	      return hstate.end ();
+	    }
+	}
+    }
+
   inchash::add_hashable_expr (expr, hstate);
 
   return hstate.end ();
 }
 
+/* Compares trees T0 and T1 to see if they are MEM_REF or ARRAY_REFs equivalent
+   to each other.  (That is, they return the value of the same bit of memory.)
+
+   Return TRUE if the two are so equivalent; FALSE if not (which could still
+   mean the two are equivalent by other means).  */
+
+static bool
+equal_mem_array_ref_p (tree t0, tree t1)
+{
+  if (TREE_CODE (t0) != MEM_REF && TREE_CODE (t0) != ARRAY_REF)
+    return false;
+  if (TREE_CODE (t1) != MEM_REF && TREE_CODE (t1) != ARRAY_REF)
+    return false;
+
+  if (!types_compatible_p (TREE_TYPE (t0), TREE_TYPE (t1)))
+    return false;
+  bool rev0;
+  HOST_WIDE_INT off0, sz0, max0;
+  tree base0 = get_ref_base_and_extent (t0, &off0, &sz0, &max0, &rev0);
+
+  bool rev1;
+  HOST_WIDE_INT off1, sz1, max1;
+  tree base1 = get_ref_base_and_extent (t1, &off1, &sz1, &max1, &rev1);
+
+  /* Types were compatible, so these are sanity checks.  */
+  gcc_assert (sz0 == sz1);
+  gcc_assert (max0 == max1);
+  gcc_assert (rev0 == rev1);
+
+  return (off0 == off1) && operand_equal_p (base0, base1, 0);
+}
+
 /* Compare two hashable_expr structures for equivalence.  They are
    considered equivalent when the expressions they denote must
    necessarily be equal.  The logic is intended to follow that of
@@ -246,9 +306,10 @@ hashable_expr_equal_p (const struct hashable_expr *expr0,
   switch (expr0->kind)
     {
     case EXPR_SINGLE:
-      return operand_equal_p (expr0->ops.single.rhs,
-                              expr1->ops.single.rhs, 0);
-
+      return equal_mem_array_ref_p (expr0->ops.single.rhs,
+				    expr1->ops.single.rhs)
+	     || operand_equal_p (expr0->ops.single.rhs,
+				 expr1->ops.single.rhs, 0);
     case EXPR_UNARY:
       if (expr0->ops.unary.op != expr1->ops.unary.op)
         return false;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-22 16:00           ` Alan Lawrence
  2015-12-22 17:02             ` Bill Schmidt
@ 2015-12-24 20:01             ` Mike Stump
  2016-01-04 12:06               ` Alan Lawrence
  1 sibling, 1 reply; 48+ messages in thread
From: Mike Stump @ 2015-12-24 20:01 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: Bill Schmidt, David Edelsohn, GCC Patches

On Dec 22, 2015, at 8:00 AM, Alan Lawrence <alan.lawrence@foss.arm.com> wrote:
> On 21/12/15 15:33, Bill Schmidt wrote:
>>> 
>>> Not on a stage1 compiler - check_p8vector_hw_available itself requires being
>>> able to run executables - I'll check on gcc112. However, both look like they're
>>> really about the host (ability to execute an asm instruction), not the target
>>> (/ability for gcc to output such an instruction)....
>> 
>> Hm, that looks like a pervasive problem for powerpc.  There are a number
>> of things that are supposed to be testing effective target but rely on
>> check_p8vector_hw_available, which as you note requires executing an
>> instruction and is really about the host.  We need to clean that up; I
>> should probably open a bug.  Kind of amazed this has gotten past us for
>> a couple of years.
> 
> Well, I was about to apologize for making a bogus remark. A really "proper" setup, would be to tell dejagnu to run your execution tests in some kind of emulator/simulator (on your host, perhaps one kind of powerpc) that only/additionally runs instructions for the other, _target_, kind of powerpc...and whatever setup you'd need for all that probably does not live in the GCC repository!

I’m not following.  dejagnu can already run tests on the target to makes decisions on which tests to run and what to expect from them, if it wants.  Some ports already do this.  Further, this is pretty typical and standard and easy to do.

You confuse the issue by mentioning host, but this I think is wrong.  These decisions have nothing to do with the host.  The are properties of the target execution environment.

I’d be happy to help if you’d like.  I’d just need the details of what you’d like help with.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/4] Add a pass_copy_prop following pass_ch_vect
  2015-12-21 13:14 ` [PATCH 3/4] Add a pass_copy_prop following pass_ch_vect Alan Lawrence
@ 2015-12-26 17:58   ` Richard Biener
  2016-01-14 10:29     ` [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs Alan Lawrence
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2015-12-26 17:58 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Mon, Dec 21, 2015 at 2:13 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This fixes the missed vectorization of gcc.dg/vect/pr65947-2.c following the
> previous patch, by cleaning up:
>
>   i_49 = 1;
>   ...
>     goto <bb 3>;
>
>   <bb 3>:
>
>   <bb 4>: <--- LOOP HEADER
>   # i_53 = PHI <i_42(7), i_49(3)>
>
> into:
>
> <bb 3>:
>   # i_53 = PHI <i_42(4), 1(2)>
>
> Allowing scalar evolution and vectorization to proceed.
>
> A similar sequence of partial peeling, header-copying, and constant propagation
> also leads to pr65947-9.c successfully vectorizing (it previously didn't as
> the iteration count was too high) when inlined and so using known data.
>
> I'm not sure whether adding a pass_copy_prop is the right thing here, but since
> loop-header-copying can create such opportunities, it seems good to take them!

Aww.  I'd rather have general infrastructure (like SCEV) deal with
those non-propagated
copies.

There are likely other missed propagation / folding opportunities
caused from partial peeling.

Richard.

> gcc/ChangeLog:
>
>         * passes.def: Add a pass_copy_prop following pass_ch_vect.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/pr65947-9.c: Add __attribute__ ((noinline)).
>
> --Alan
> ---
>  gcc/passes.def                        | 1 +
>  gcc/testsuite/gcc.dg/vect/pr65947-9.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 624d121..1043548 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -269,6 +269,7 @@ along with GCC; see the file COPYING3.  If not see
>               NEXT_PASS (pass_expand_omp_ssa);
>           POP_INSERT_PASSES ()
>           NEXT_PASS (pass_ch_vect);
> +         NEXT_PASS (pass_copy_prop);
>           NEXT_PASS (pass_if_conversion);
>           /* pass_vectorize must immediately follow pass_if_conversion.
>              Please do not add any other passes in between.  */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-9.c b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
> index d0da13f..c7aac8d 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr65947-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
> @@ -7,7 +7,7 @@ extern void abort (void) __attribute__ ((noreturn));
>  /* Condition reduction with maximum possible loop size.  Will fail to
>     vectorize because the vectorisation requires a slot for default values.  */
>
> -char
> +__attribute__ ((noinline)) char
>  condition_reduction (char *a, char min_v)
>  {
>    char last = -72;
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-24 20:01             ` Mike Stump
@ 2016-01-04 12:06               ` Alan Lawrence
  0 siblings, 0 replies; 48+ messages in thread
From: Alan Lawrence @ 2016-01-04 12:06 UTC (permalink / raw)
  To: Mike Stump; +Cc: Bill Schmidt, David Edelsohn, GCC Patches

On 24/12/15 19:59, Mike Stump wrote:
> On Dec 22, 2015, at 8:00 AM, Alan Lawrence <alan.lawrence@foss.arm.com> wrote:
>> On 21/12/15 15:33, Bill Schmidt wrote:
>>>>
>>>> Not on a stage1 compiler - check_p8vector_hw_available itself requires being
>>>> able to run executables - I'll check on gcc112. However, both look like they're
>>>> really about the host (ability to execute an asm instruction), not the target
>>>> (/ability for gcc to output such an instruction)....
>>>
>>> Hm, that looks like a pervasive problem for powerpc.  There are a number
>>> of things that are supposed to be testing effective target but rely on
>>> check_p8vector_hw_available, which as you note requires executing an
>>> instruction and is really about the host.  We need to clean that up; I
>>> should probably open a bug.  Kind of amazed this has gotten past us for
>>> a couple of years.
>>
>> Well, I was about to apologize for making a bogus remark. A really "proper" setup, would be to tell dejagnu to run your execution tests in some kind of emulator/simulator (on your host, perhaps one kind of powerpc) that only/additionally runs instructions for the other, _target_, kind of powerpc...and whatever setup you'd need for all that probably does not live in the GCC repository!
>
> IÂ’m not following.  dejagnu can already run tests on the target to makes decisions on which tests to run and what to expect from them, if it wants.  Some ports already do this.  Further, this is pretty typical and standard and easy to do
>
> You confuse the issue by mentioning host, but this I think is wrong.  These decisions have nothing to do with the host.  The are properties of the target execution environment.
>
> IÂ’d be happy to help if youÂ’d like.  IÂ’d just need the details of what youÂ’d like help with.

You're right, which is why I described my first (wrong) remark as bogus. That 
is, check_p8vector_hw_available is executing an assembly instruction, and on a 
well-configured test setup, that would potentially invoke an emulator etc. - 
whereas I am just doing 'native' testing on gcc110/gcc112 on the compile farm.

So (as Mike says) there is no bug here, but one just needs to be aware that 
passing -mcpu=power7 (say) is not sufficient to make check_p8vector_hw_available 
return false when executing on a power8 host; you would also need to set up some 
kind of power7 emulator/simulator.

Hope that's clear!

Thanks,
Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/4] Make SRA scalarize constant-pool loads
  2015-12-24 11:53   ` Alan Lawrence
@ 2016-01-04 12:13     ` Alan Lawrence
  2016-01-15 10:27     ` Alan Lawrence
  1 sibling, 0 replies; 48+ messages in thread
From: Alan Lawrence @ 2016-01-04 12:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt

On 24/12/15 11:53, Alan Lawrence wrote:
> Here's a new version that fixes the gcc.dg/guality/pr54970.c failures seen on
> aarch64 and powerpc64.
[snip]
> This also fixes a bunch of other guality tests on AArch64 that were failing
> prior to the patch series, and another bunch on PowerPC64 (bigendian -m32), listed below.

Ach, sorry, not quite. That version avoids any regressions (e.g. in pr54970.c), 
but does not fix all those other tests, unless you also have this hunk 
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html):

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index a3ff2df..2a741b8 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2651,7 +2651,8 @@ analyze_all_variable_accesses (void)
  	    && scalarizable_type_p (TREE_TYPE (var)))
  	  {
  	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
-		<= max_scalarization_size)
+		  <= max_scalarization_size
+		|| DECL_IN_CONSTANT_POOL (var))
  	      {
  		create_total_scalarization_access (var);
  		completely_scalarize (var, TREE_TYPE (var), 0, var);


...which I was using to increase test coverage of the SRA changes. 
(Alternatively, you can "fix" the tests by running the testsuite with a forced 
--param sra-max-scalarization-size. But this is only saying that the dwarf info 
now generated by scalarizing constant-pools, is better than whatever dwarf was 
being generated by whatever other part of the compiler before.)

--Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2015-12-24 11:55   ` Alan Lawrence
@ 2016-01-04 19:05     ` Jeff Law
  2016-01-19  3:05     ` H.J. Lu
  1 sibling, 0 replies; 48+ messages in thread
From: Jeff Law @ 2016-01-04 19:05 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches

On 12/24/2015 04:55 AM, Alan Lawrence wrote:
> This version changes the test cases to fix failures on some platforms, by
> rewriting the initializers so that they aren't pushed out to the constant pool.
>
> gcc/ChangeLog:
>
> 	* tree-ssa-scopedtables.c (avail_expr_hash): Hash MEM_REF and ARRAY_REF
> 	using get_ref_base_and_extent.
> 	(equal_mem_array_ref_p): New.
> 	(hashable_expr_equal_p): Add call to previous.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/tree-ssa/ssa-dom-cse-5.c: New.
> 	* gcc.dg/tree-ssa/ssa-dom-cse-6.c: New.
> 	* gcc.dg/tree-ssa/ssa-dom-cse-7.c: New.
This is fine.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2015-12-21 13:14 ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Alan Lawrence
  2015-12-24 11:55   ` Alan Lawrence
@ 2016-01-04 19:08   ` Jeff Law
  2016-01-05  7:29     ` Richard Biener
  1 sibling, 1 reply; 48+ messages in thread
From: Jeff Law @ 2016-01-04 19:08 UTC (permalink / raw)
  To: Alan Lawrence, gcc-patches

On 12/21/2015 06:13 AM, Alan Lawrence wrote:
> This is a respin of patches
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03266.html and
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03267.html, which were
> "too quickly" approved before concerns with efficiency were pointed out.
>
> I tried to change the hashing just in tree-ssa-dom.c using C++ subclassing, but
> couldn't cleanly separate this out from tree-ssa-scopedtables and
> tree-ssa-threadedge.c due to use of avail_exprs_stack. So I figured it was
> probably appropriate to use the equivalences in jump threading too. Also,
> using get_ref_base_and_extent unifies handling of MEM_REFs and ARRAY_REFs
> (hence only one patch rather than two).
It is appropriate.


> I've added a couple of testcases that show the improvement in DOM, but in all
> cases I had to disable FRE, even PRE, to get any improvement, apart from on
> ssa-dom-cse-2.c itself (where on the affected platforms FRE still does not do
> the optimization). This makes me wonder if this is the right approach or whether
> changing the references output by SRA (as per
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01490.html , judged as a hack to
> SRA to work around limitations in DOM - or is it?) would be better.
I just doubt it happens all that much.



Jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2016-01-04 19:08   ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Jeff Law
@ 2016-01-05  7:29     ` Richard Biener
  2016-01-05 16:29       ` Alan Lawrence
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2016-01-05  7:29 UTC (permalink / raw)
  To: Jeff Law, Alan Lawrence, gcc-patches

On January 4, 2016 8:08:17 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>On 12/21/2015 06:13 AM, Alan Lawrence wrote:
>> This is a respin of patches
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03266.html and
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03267.html, which were
>> "too quickly" approved before concerns with efficiency were pointed
>out.
>>
>> I tried to change the hashing just in tree-ssa-dom.c using C++
>subclassing, but
>> couldn't cleanly separate this out from tree-ssa-scopedtables and
>> tree-ssa-threadedge.c due to use of avail_exprs_stack. So I figured
>it was
>> probably appropriate to use the equivalences in jump threading too.
>Also,
>> using get_ref_base_and_extent unifies handling of MEM_REFs and
>ARRAY_REFs

Without looking at the patch, ARRAY_REFs can have non-constant indices which get_ref_base_and_extend handles conservative.  You should make sure to not regress here.

Richard.

>> (hence only one patch rather than two).
>It is appropriate.
>
>
>> I've added a couple of testcases that show the improvement in DOM,
>but in all
>> cases I had to disable FRE, even PRE, to get any improvement, apart
>from on
>> ssa-dom-cse-2.c itself (where on the affected platforms FRE still
>does not do
>> the optimization). This makes me wonder if this is the right approach
>or whether
>> changing the references output by SRA (as per
>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01490.html , judged as
>a hack to
>> SRA to work around limitations in DOM - or is it?) would be better.
>I just doubt it happens all that much.
>
>
>
>Jeff


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2016-01-05  7:29     ` Richard Biener
@ 2016-01-05 16:29       ` Alan Lawrence
  2016-01-05 16:36         ` Jeff Law
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2016-01-05 16:29 UTC (permalink / raw)
  To: Richard Biener, Jeff Law, gcc-patches

On 05/01/16 07:29, Richard Biener wrote:
> On January 4, 2016 8:08:17 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>> On 12/21/2015 06:13 AM, Alan Lawrence wrote:
>>> This is a respin of patches
>>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03266.html and
>>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03267.html, which were
>>> "too quickly" approved before concerns with efficiency were pointed
>> out.
>>>
>>> I tried to change the hashing just in tree-ssa-dom.c using C++
>> subclassing, but
>>> couldn't cleanly separate this out from tree-ssa-scopedtables and
>>> tree-ssa-threadedge.c due to use of avail_exprs_stack. So I figured
>> it was
>>> probably appropriate to use the equivalences in jump threading too.
>> Also,
>>> using get_ref_base_and_extent unifies handling of MEM_REFs and
>> ARRAY_REFs
>
> Without looking at the patch, ARRAY_REFs can have non-constant indices which get_ref_base_and_extend handles conservative.  You should make sure to not regress here.

Thanks for the warning - my understanding is that in such a case, 
get_ref_base_and_extent returns max_size=(size of the whole array), size=(size 
of one element); and I only handle cases where size==max_size. Arrays of unknown 
length have size -1, so will never be equal.

Thanks, Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2016-01-05 16:29       ` Alan Lawrence
@ 2016-01-05 16:36         ` Jeff Law
  2016-01-08 10:45           ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff Law @ 2016-01-05 16:36 UTC (permalink / raw)
  To: Alan Lawrence, Richard Biener, gcc-patches

On 01/05/2016 09:29 AM, Alan Lawrence wrote:
>> Without looking at the patch, ARRAY_REFs can have non-constant indices
>> which get_ref_base_and_extend handles conservative.  You should make
>> sure to not regress here.
>
> Thanks for the warning - my understanding is that in such a case,
> get_ref_base_and_extent returns max_size=(size of the whole array),
> size=(size of one element); and I only handle cases where
> size==max_size. Arrays of unknown length have size -1, so will never be
> equal.
That was my understanding as well -- I'd been looking at that mostly in 
terms of making sure we were hashing the right stuff and that we were 
checking the right stuff in the equality function.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2016-01-05 16:36         ` Jeff Law
@ 2016-01-08 10:45           ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2016-01-08 10:45 UTC (permalink / raw)
  To: Jeff Law; +Cc: Alan Lawrence, GCC Patches

On Tue, Jan 5, 2016 at 5:36 PM, Jeff Law <law@redhat.com> wrote:
> On 01/05/2016 09:29 AM, Alan Lawrence wrote:
>>>
>>> Without looking at the patch, ARRAY_REFs can have non-constant indices
>>> which get_ref_base_and_extend handles conservative.  You should make
>>> sure to not regress here.
>>
>>
>> Thanks for the warning - my understanding is that in such a case,
>> get_ref_base_and_extent returns max_size=(size of the whole array),
>> size=(size of one element); and I only handle cases where
>> size==max_size. Arrays of unknown length have size -1, so will never be
>> equal.
>
> That was my understanding as well -- I'd been looking at that mostly in
> terms of making sure we were hashing the right stuff and that we were
> checking the right stuff in the equality function.

Now looking at the patch I wonder why you restrict this to plain MEM_REFs
and ARRAY_REFs.  MEM_REFs should already be in the desired canonical
form and I don't see why you can't extend ARRAY_REFs to handled_component_p ()
to make it also equate component references and things like real/imagparts.

Richard.

> jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs.
  2015-12-26 17:58   ` Richard Biener
@ 2016-01-14 10:29     ` Alan Lawrence
  2016-01-14 12:30       ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2016-01-14 10:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther

(Previous message: https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02159.html)
On Sat, Dec 26, 2015 at 18:58 PM, Richard Biener <richard.guenther@gmail.com> wrote:

>> I'm not sure whether adding a pass_copy_prop is the right thing here, but since
>> loop-header-copying can create such opportunities, it seems good to take them!
>
> Aww.  I'd rather have general infrastructure (like SCEV) deal with
> those non-propagated copies.
>
> There are likely other missed propagation / folding opportunities
> caused from partial peeling.
>
> Richard.

I take your second point, but I am concerned that this leads to duplication of
copy-propagation code throughout the compiler?

However, this patch does that. Making analyze_initial_condition (alone) follow
copies, is enough to solve my case of missed vectorization of pr65947-2.c;
but I also used the routine in analyze_scalar_evolution_1, as it found copies
to follow in both bootstrap and a significant number of testcases:

c-c++-common/gomp/pr58472.c
gcc.c-torture/execute/20000422-1.c
gcc.c-torture/execute/20041213-2.c
gcc.c-torture/execute/20100430-1.c
gcc.c-torture/execute/pr49712.c
gcc.c-torture/execute/pr58640.c
gcc.dg/Warray-bounds-13.c
gcc.dg/graphite/block-{1,5,6}.c
gcc.dg/graphite/interchange-12.c
gcc.dg/graphite/interchange-mvt.c
gcc.dg/graphite/pr19910.c
gcc.dg/graphite/uns-block-1.c
gcc.dg/loop-unswitch-{2,4}.c
gcc.dg/pr59670.c
gcc.dg/torture/matrix-{1,2,3,4,5,6}.c
gcc.dg/torture/pr24750-1.c
gcc.dg/torture/transpose-{1,2,3,4,5,6}.c

...some of which were resolved to constants.

(Of course, this approach is not as thorough as adding a pass_copy_prop. E.g. it
does *not* enable vectorization of the inlined copy in pr65947-9.c.)

Bootstrapped + check-gcc + check-g++ on ARM, AArch64, x86_64.

gcc/ChangeLog:

	* tree-scalar-evolution.c (follow_copies): New.
	(analyze_initial_condition, analyze_scalar_evolution_1): Call previous.
---
 gcc/tree-scalar-evolution.c | 53 ++++++++++++++++++++++++++++++---------------
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 9b33693..357eb8f 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1522,6 +1522,37 @@ analyze_evolution_in_loop (gphi *loop_phi_node,
   return evolution_function;
 }
 
+/* While VAR is an SSA_NAME whose definition is a straight copy of another name,
+   or (perhaps via a degenerate phi) a constant, follows that definition.
+   Returns the constant, or the earliest SSA_NAME whose definition is neither
+   constant nor copy.  */
+
+static tree
+follow_copies (tree var)
+{
+  while (TREE_CODE (var) == SSA_NAME)
+    {
+      gimple *def = SSA_NAME_DEF_STMT (var);
+      if (gphi *phi = dyn_cast <gphi *> (def))
+	{
+	  tree rhs = degenerate_phi_result (phi);
+	  /* Don't follow degenerate phi's of SSA_NAMES, that can break
+	     loop-closed SSA form.  */
+	  if (rhs && is_gimple_min_invariant (rhs))
+	    var = rhs;
+	  break;
+	}
+      else if (gimple_assign_single_p (def)
+	       && !gimple_vuse (def)
+	       && (gimple_assign_rhs_code (def) == SSA_NAME
+		   || is_gimple_min_invariant (gimple_assign_rhs1 (def))))
+	var = gimple_assign_rhs1 (def);
+      else
+	break;
+    }
+  return var;
+}
+
 /* Given a loop-phi-node, return the initial conditions of the
    variable on entry of the loop.  When the CCP has propagated
    constants into the loop-phi-node, the initial condition is
@@ -1574,21 +1605,9 @@ analyze_initial_condition (gphi *loop_phi_node)
   if (init_cond == chrec_not_analyzed_yet)
     init_cond = chrec_dont_know;
 
-  /* During early loop unrolling we do not have fully constant propagated IL.
-     Handle degenerate PHIs here to not miss important unrollings.  */
-  if (TREE_CODE (init_cond) == SSA_NAME)
-    {
-      gimple *def = SSA_NAME_DEF_STMT (init_cond);
-      if (gphi *phi = dyn_cast <gphi *> (def))
-	{
-	  tree res = degenerate_phi_result (phi);
-	  if (res != NULL_TREE
-	      /* Only allow invariants here, otherwise we may break
-		 loop-closed SSA form.  */
-	      && is_gimple_min_invariant (res))
-	    init_cond = res;
-	}
-    }
+  /* We may not have fully constant propagated IL.  Handle degenerate PHIs here
+     to not miss important early loop unrollings.  */
+  init_cond = follow_copies (init_cond);
 
   if (dump_file && (dump_flags & TDF_SCEV))
     {
@@ -1968,8 +1987,8 @@ analyze_scalar_evolution_1 (struct loop *loop, tree var, tree res)
   if (bb == NULL
       || !flow_bb_inside_loop_p (loop, bb))
     {
-      /* Keep the symbolic form.  */
-      res = var;
+      /* Keep symbolic form, but look through obvious copies for constants.  */
+      res = follow_copies (var);
       goto set_and_end;
     }
 
-- 
1.9.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs.
  2016-01-14 10:29     ` [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs Alan Lawrence
@ 2016-01-14 12:30       ` Richard Biener
  2016-01-15  9:48         ` Alan Lawrence
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2016-01-14 12:30 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Thu, Jan 14, 2016 at 11:29 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> (Previous message: https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02159.html)
> On Sat, Dec 26, 2015 at 18:58 PM, Richard Biener <richard.guenther@gmail.com> wrote:
>
>>> I'm not sure whether adding a pass_copy_prop is the right thing here, but since
>>> loop-header-copying can create such opportunities, it seems good to take them!
>>
>> Aww.  I'd rather have general infrastructure (like SCEV) deal with
>> those non-propagated copies.
>>
>> There are likely other missed propagation / folding opportunities
>> caused from partial peeling.
>>
>> Richard.
>
> I take your second point, but I am concerned that this leads to duplication of
> copy-propagation code throughout the compiler?
>
> However, this patch does that. Making analyze_initial_condition (alone) follow
> copies, is enough to solve my case of missed vectorization of pr65947-2.c;
> but I also used the routine in analyze_scalar_evolution_1, as it found copies
> to follow in both bootstrap and a significant number of testcases:
>
> c-c++-common/gomp/pr58472.c
> gcc.c-torture/execute/20000422-1.c
> gcc.c-torture/execute/20041213-2.c
> gcc.c-torture/execute/20100430-1.c
> gcc.c-torture/execute/pr49712.c
> gcc.c-torture/execute/pr58640.c
> gcc.dg/Warray-bounds-13.c
> gcc.dg/graphite/block-{1,5,6}.c
> gcc.dg/graphite/interchange-12.c
> gcc.dg/graphite/interchange-mvt.c
> gcc.dg/graphite/pr19910.c
> gcc.dg/graphite/uns-block-1.c
> gcc.dg/loop-unswitch-{2,4}.c
> gcc.dg/pr59670.c
> gcc.dg/torture/matrix-{1,2,3,4,5,6}.c
> gcc.dg/torture/pr24750-1.c
> gcc.dg/torture/transpose-{1,2,3,4,5,6}.c
>
> ...some of which were resolved to constants.
>
> (Of course, this approach is not as thorough as adding a pass_copy_prop. E.g. it
> does *not* enable vectorization of the inlined copy in pr65947-9.c.)
>
> Bootstrapped + check-gcc + check-g++ on ARM, AArch64, x86_64.
>
> gcc/ChangeLog:
>
>         * tree-scalar-evolution.c (follow_copies): New.
>         (analyze_initial_condition, analyze_scalar_evolution_1): Call previous.
> ---
>  gcc/tree-scalar-evolution.c | 53 ++++++++++++++++++++++++++++++---------------
>  1 file changed, 36 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 9b33693..357eb8f 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1522,6 +1522,37 @@ analyze_evolution_in_loop (gphi *loop_phi_node,
>    return evolution_function;
>  }
>
> +/* While VAR is an SSA_NAME whose definition is a straight copy of another name,
> +   or (perhaps via a degenerate phi) a constant, follows that definition.
> +   Returns the constant, or the earliest SSA_NAME whose definition is neither
> +   constant nor copy.  */
> +
> +static tree
> +follow_copies (tree var)
> +{
> +  while (TREE_CODE (var) == SSA_NAME)
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (var);
> +      if (gphi *phi = dyn_cast <gphi *> (def))
> +       {
> +         tree rhs = degenerate_phi_result (phi);
> +         /* Don't follow degenerate phi's of SSA_NAMES, that can break
> +            loop-closed SSA form.  */
> +         if (rhs && is_gimple_min_invariant (rhs))
> +           var = rhs;
> +         break;
> +       }
> +      else if (gimple_assign_single_p (def)
> +              && !gimple_vuse (def)

The vuse test is not necessary

> +              && (gimple_assign_rhs_code (def) == SSA_NAME
> +                  || is_gimple_min_invariant (gimple_assign_rhs1 (def))))

and the is_gimple_min_invariant (rhs1) test is not sufficient if you
consider - (-INT_MIN) with -ftrapv for example.

I'd say you should do

  else if (gassign *ass = dyn_cast <gassign *> (def))
    {
       tree_code code = gimple_assign_rhs_code (ass);
       if (code == SSA_NAME
           || CONSTANT_CLASS_P (code))
         var = gimple_assign_rhs1 (ass);
       else
         break;
    }

> +       var = gimple_assign_rhs1 (def);
> +      else
> +       break;
> +    }
> +  return var;
> +}
> +
>  /* Given a loop-phi-node, return the initial conditions of the
>     variable on entry of the loop.  When the CCP has propagated
>     constants into the loop-phi-node, the initial condition is
> @@ -1574,21 +1605,9 @@ analyze_initial_condition (gphi *loop_phi_node)
>    if (init_cond == chrec_not_analyzed_yet)
>      init_cond = chrec_dont_know;
>
> -  /* During early loop unrolling we do not have fully constant propagated IL.
> -     Handle degenerate PHIs here to not miss important unrollings.  */
> -  if (TREE_CODE (init_cond) == SSA_NAME)
> -    {
> -      gimple *def = SSA_NAME_DEF_STMT (init_cond);
> -      if (gphi *phi = dyn_cast <gphi *> (def))
> -       {
> -         tree res = degenerate_phi_result (phi);
> -         if (res != NULL_TREE
> -             /* Only allow invariants here, otherwise we may break
> -                loop-closed SSA form.  */
> -             && is_gimple_min_invariant (res))
> -           init_cond = res;
> -       }
> -    }
> +  /* We may not have fully constant propagated IL.  Handle degenerate PHIs here
> +     to not miss important early loop unrollings.  */
> +  init_cond = follow_copies (init_cond);
>
>    if (dump_file && (dump_flags & TDF_SCEV))
>      {
> @@ -1968,8 +1987,8 @@ analyze_scalar_evolution_1 (struct loop *loop, tree var, tree res)
>    if (bb == NULL
>        || !flow_bb_inside_loop_p (loop, bb))
>      {
> -      /* Keep the symbolic form.  */
> -      res = var;
> +      /* Keep symbolic form, but look through obvious copies for constants.  */

You're also looing for SSA names copied so the comment is sligntly wrong.

Ok with those changes.

Thanks,
Richard.

> +      res = follow_copies (var);
>        goto set_and_end;
>      }
>
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs.
  2016-01-14 12:30       ` Richard Biener
@ 2016-01-15  9:48         ` Alan Lawrence
  2016-01-15 10:07           ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2016-01-15  9:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.guenther

On Thu, Jan 14, 2016 at 12:30 PM, Richard Biener <richard.guenther@gmail.com> wrote:

>
> The vuse test is not necessary
>
>> +              && (gimple_assign_rhs_code (def) == SSA_NAME
>> +                  || is_gimple_min_invariant (gimple_assign_rhs1 (def))))
>
> and the is_gimple_min_invariant (rhs1) test is not sufficient if you
> consider - (-INT_MIN) with -ftrapv for example.

Thanks, I didn't realize gimple_min_invariant would allow such cases.

>> +      /* Keep symbolic form, but look through obvious copies for constants.  */
>
> You're also looing for SSA names copied so the comment is sligntly wrong.

So it occurs to me that we do only need to look for constants: picking one SSA
name instead of another (when both are outside the loop) doesn't really help
SCEV; and it seems safer/less change to avoid fiddling around with which names
are used. Moreover, given we are in LC _SSA_, I think we can look through
degenerate phi's even of SSA_NAMES, without violating loop-closed-ness,
so long as we only use cases where we eventually reach a constant.

Therefore, how about this version?

(Bootstrapped + check-gcc + check-g++ on x86_64, AArch64, ARM; again, fixing
vectorization of pr65947-2.c).

If this is OK - that leaves only the first patch to SRA,
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02117.html (the "extra" fixes
I claimed in the guality testsuite were a testing mistake, but this
version still fixes the regressions in guality pr54970.c from the original
"ok with the comment added and a testcase (or two)"
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01948.html, as well as the
comment + 2*testcase). I think I'd still like to propose these as a stage 3/4
fix (with --param sra-max-scalarization-size-Ospeed=32) of PR/63679, a
regression against gcc 4.9.

Thanks, Alan

gcc/ChangeLog:

	* tree-scalar-evolution.c (follow_copies_to_constant): New.
	(analyze_initial_condition, analyze_scalar_evolution_1): Call previous.
---
 gcc/tree-scalar-evolution.c | 50 ++++++++++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 9b33693..470c3ea 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1522,6 +1522,34 @@ analyze_evolution_in_loop (gphi *loop_phi_node,
   return evolution_function;
 }
 
+/* Looks to see if VAR is a copy of a constant (via straightforward assignments
+   or degenerate phi's).  If so, returns the constant; else, returns VAR.  */
+
+static tree
+follow_copies_to_constant (tree var)
+{
+  tree res = var;
+  while (TREE_CODE (res) == SSA_NAME)
+    {
+      gimple *def = SSA_NAME_DEF_STMT (res);
+      if (gphi *phi = dyn_cast <gphi *> (def))
+	{
+	  if (tree rhs = degenerate_phi_result (phi))
+	    res = rhs;
+	  else
+	    break;
+	}
+      else if (gimple_assign_single_p (def))
+	/* Will exit loop if not an SSA_NAME.  */
+	res = gimple_assign_rhs1 (def);
+      else
+	break;
+    }
+  if (CONSTANT_CLASS_P (res))
+    return res;
+  return var;
+}
+
 /* Given a loop-phi-node, return the initial conditions of the
    variable on entry of the loop.  When the CCP has propagated
    constants into the loop-phi-node, the initial condition is
@@ -1574,21 +1602,9 @@ analyze_initial_condition (gphi *loop_phi_node)
   if (init_cond == chrec_not_analyzed_yet)
     init_cond = chrec_dont_know;
 
-  /* During early loop unrolling we do not have fully constant propagated IL.
-     Handle degenerate PHIs here to not miss important unrollings.  */
-  if (TREE_CODE (init_cond) == SSA_NAME)
-    {
-      gimple *def = SSA_NAME_DEF_STMT (init_cond);
-      if (gphi *phi = dyn_cast <gphi *> (def))
-	{
-	  tree res = degenerate_phi_result (phi);
-	  if (res != NULL_TREE
-	      /* Only allow invariants here, otherwise we may break
-		 loop-closed SSA form.  */
-	      && is_gimple_min_invariant (res))
-	    init_cond = res;
-	}
-    }
+  /* We may not have fully constant propagated IL.  Handle degenerate PHIs here
+     to not miss important early loop unrollings.  */
+  init_cond = follow_copies_to_constant (init_cond);
 
   if (dump_file && (dump_flags & TDF_SCEV))
     {
@@ -1968,8 +1984,8 @@ analyze_scalar_evolution_1 (struct loop *loop, tree var, tree res)
   if (bb == NULL
       || !flow_bb_inside_loop_p (loop, bb))
     {
-      /* Keep the symbolic form.  */
-      res = var;
+      /* Keep symbolic form, but look through obvious copies for constants.  */
+      res = follow_copies_to_constant (var);
       goto set_and_end;
     }
 
-- 
1.9.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs.
  2016-01-15  9:48         ` Alan Lawrence
@ 2016-01-15 10:07           ` Richard Biener
  2016-01-15 10:38             ` Alan Lawrence
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Biener @ 2016-01-15 10:07 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Fri, Jan 15, 2016 at 10:47 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On Thu, Jan 14, 2016 at 12:30 PM, Richard Biener <richard.guenther@gmail.com> wrote:
>
>>
>> The vuse test is not necessary
>>
>>> +              && (gimple_assign_rhs_code (def) == SSA_NAME
>>> +                  || is_gimple_min_invariant (gimple_assign_rhs1 (def))))
>>
>> and the is_gimple_min_invariant (rhs1) test is not sufficient if you
>> consider - (-INT_MIN) with -ftrapv for example.
>
> Thanks, I didn't realize gimple_min_invariant would allow such cases.

Well, the invariant would be -INT_MIN but gimple_assign_rhs_code (def) would
be NEGATE_EXPR.  Basically you forgot about unary operators.

>>> +      /* Keep symbolic form, but look through obvious copies for constants.  */
>>
>> You're also looing for SSA names copied so the comment is sligntly wrong.
>
> So it occurs to me that we do only need to look for constants: picking one SSA
> name instead of another (when both are outside the loop) doesn't really help
> SCEV; and it seems safer/less change to avoid fiddling around with which names
> are used. Moreover, given we are in LC _SSA_, I think we can look through
> degenerate phi's even of SSA_NAMES, without violating loop-closed-ness,
> so long as we only use cases where we eventually reach a constant.

True.

> Therefore, how about this version?
>
> (Bootstrapped + check-gcc + check-g++ on x86_64, AArch64, ARM; again, fixing
> vectorization of pr65947-2.c).

This version is ok.  I wonder if you have a testcase (maybe it comes with the
still pending patches)

> If this is OK - that leaves only the first patch to SRA,
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02117.html (the "extra" fixes
> I claimed in the guality testsuite were a testing mistake, but this
> version still fixes the regressions in guality pr54970.c from the original
> "ok with the comment added and a testcase (or two)"
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01948.html, as well as the
> comment + 2*testcase). I think I'd still like to propose these as a stage 3/4
> fix (with --param sra-max-scalarization-size-Ospeed=32) of PR/63679, a
> regression against gcc 4.9.

Can you ping the specific patches?

Thanks,
Richard.

> Thanks, Alan
>
> gcc/ChangeLog:
>
>         * tree-scalar-evolution.c (follow_copies_to_constant): New.
>         (analyze_initial_condition, analyze_scalar_evolution_1): Call previous.
> ---
>  gcc/tree-scalar-evolution.c | 50 ++++++++++++++++++++++++++++++---------------
>  1 file changed, 33 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 9b33693..470c3ea 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1522,6 +1522,34 @@ analyze_evolution_in_loop (gphi *loop_phi_node,
>    return evolution_function;
>  }
>
> +/* Looks to see if VAR is a copy of a constant (via straightforward assignments
> +   or degenerate phi's).  If so, returns the constant; else, returns VAR.  */
> +
> +static tree
> +follow_copies_to_constant (tree var)
> +{
> +  tree res = var;
> +  while (TREE_CODE (res) == SSA_NAME)
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (res);
> +      if (gphi *phi = dyn_cast <gphi *> (def))
> +       {
> +         if (tree rhs = degenerate_phi_result (phi))
> +           res = rhs;
> +         else
> +           break;
> +       }
> +      else if (gimple_assign_single_p (def))
> +       /* Will exit loop if not an SSA_NAME.  */
> +       res = gimple_assign_rhs1 (def);
> +      else
> +       break;
> +    }
> +  if (CONSTANT_CLASS_P (res))
> +    return res;
> +  return var;
> +}
> +
>  /* Given a loop-phi-node, return the initial conditions of the
>     variable on entry of the loop.  When the CCP has propagated
>     constants into the loop-phi-node, the initial condition is
> @@ -1574,21 +1602,9 @@ analyze_initial_condition (gphi *loop_phi_node)
>    if (init_cond == chrec_not_analyzed_yet)
>      init_cond = chrec_dont_know;
>
> -  /* During early loop unrolling we do not have fully constant propagated IL.
> -     Handle degenerate PHIs here to not miss important unrollings.  */
> -  if (TREE_CODE (init_cond) == SSA_NAME)
> -    {
> -      gimple *def = SSA_NAME_DEF_STMT (init_cond);
> -      if (gphi *phi = dyn_cast <gphi *> (def))
> -       {
> -         tree res = degenerate_phi_result (phi);
> -         if (res != NULL_TREE
> -             /* Only allow invariants here, otherwise we may break
> -                loop-closed SSA form.  */
> -             && is_gimple_min_invariant (res))
> -           init_cond = res;
> -       }
> -    }
> +  /* We may not have fully constant propagated IL.  Handle degenerate PHIs here
> +     to not miss important early loop unrollings.  */
> +  init_cond = follow_copies_to_constant (init_cond);
>
>    if (dump_file && (dump_flags & TDF_SCEV))
>      {
> @@ -1968,8 +1984,8 @@ analyze_scalar_evolution_1 (struct loop *loop, tree var, tree res)
>    if (bb == NULL
>        || !flow_bb_inside_loop_p (loop, bb))
>      {
> -      /* Keep the symbolic form.  */
> -      res = var;
> +      /* Keep symbolic form, but look through obvious copies for constants.  */
> +      res = follow_copies_to_constant (var);
>        goto set_and_end;
>      }
>
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/4] Make SRA scalarize constant-pool loads
  2015-12-24 11:53   ` Alan Lawrence
  2016-01-04 12:13     ` Alan Lawrence
@ 2016-01-15 10:27     ` Alan Lawrence
  2016-01-18 11:04       ` Richard Biener
  1 sibling, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2016-01-15 10:27 UTC (permalink / raw)
  To: gcc-patches

On 24/12/15 11:53, Alan Lawrence wrote:
> Here's a new version that fixes the gcc.dg/guality/pr54970.c failures seen on
> aarch64 and powerpc64. Prior to SRA handling constant pool decls,
> -fdump-tree-esra-details (at -O1 -g) had shown:
>    <bb 2>:
>    a = *.LC0;
>    # DEBUG a$0 => MEM[(int[3] *)&*.LC0]
>    a$4_3 = MEM[(int[3] *)&*.LC0 + 4B];
>    # DEBUG a$4 => a$4_3
>    a$8_4 = MEM[(int[3] *)&*.LC0 + 8B];
>
> The previous patch changed this to:
>    <bb 2>:
>    SR.5_3 = *.LC0[0];
>    SR.6_4 = *.LC0[1];
>    SR.7_19 = *.LC0[2];
>    SR.8_21 = *.LC1[0];
>    SR.9_22 = *.LC1[1];
>    SR.10_23 = *.LC1[2];
>    # DEBUG a$0 => NULL   // Note here
>    a$4_24 = SR.6_4;
>    # DEBUG a$4 => a$4_24
>    a$8_25 = SR.7_19;
>
> Turns out the DEBUG a$0 => NULL was produced in load_assign_lhs_subreplacements:
>
> 	  if (lacc && lacc->grp_to_be_debug_replaced)
> 	    {
> 	      gdebug *ds;
> 	      tree drhs;
> 	      struct access *racc = find_access_in_subtree (sad->top_racc,
> 							    offset,
> 							    lacc->size);
>
> 	      if (racc && racc->grp_to_be_replaced)
> 		{
> 		  if (racc->grp_write)
> 		    drhs = get_access_replacement (racc);
> 		  else
> 		    drhs = NULL;  // <=== HERE
> 		}
> ...
> 	      ds = gimple_build_debug_bind (get_access_replacement (lacc),
> 					    drhs, gsi_stmt (sad->old_gsi));
>
> Prior to the patch, we'd skipped around load_assign_lhs_subreplacements, because
> access_has_children_p(racc) (for racc = *.LC0) didn't hold in sra_modify_assign.
>
> I also added a constant_decl_p function, combining the two checks, plus some
> testcase fixes.
>
> Bootstrapped + check-gcc,g++ on x86_64, ARM, AArch64,
>    also on powerpc64{,le}-none-linux-gnu *in combination with the other patches
>    in the series* (I haven't tested the individual patches on PPC),
>    plus Ada on ARM and x86_64.
>
> gcc/ChangeLog:
>
> 	PR target/63679
> 	* tree-sra.c (disqualified_constants, constant_decl_p): New.
> 	(sra_initialize): Allocate disqualified_constants.
> 	(sra_deinitialize): Free disqualified_constants.
> 	(disqualify_candidate): Update disqualified_constants when appropriate.
> 	(create_access): Scan for constant-pool entries as we go along.
> 	(scalarizable_type_p): Add check against type_contains_placeholder_p.
> 	(maybe_add_sra_candidate): Allow constant-pool entries.
> 	(load_assign_lhs_subreplacements): Bind debug for constant pool vars.
> 	(initialize_constant_pool_replacements): New.
> 	(sra_modify_assign): Avoid mangling assignments created by previous,
> 	and don't generate writes into constant pool.
> 	(sra_modify_function_body): Call initialize_constant_pool_replacements.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/tree-ssa/sra-17.c: New.
> 	* gcc.dg/tree-ssa/sra-18.c: New.

Ping.

(The next bit is false, unless you force SRA to happen more widely, but all the 
above stands)

> This also fixes a bunch of other guality tests on AArch64 that were failing
> prior to the patch series, and another bunch on PowerPC64 (bigendian -m32), listed below.
 >
> Tests fixed on aarch64-none-linux-gnu:
> gcc.dg/guality/pr54970.c   -O1  line 15 *p == 3
> gcc.dg/guality/pr54970.c   -O1  line 15 *q == 2
> gcc.dg/guality/pr54970.c   -O1  line 20 *p == 13
> gcc.dg/guality/pr54970.c   -O1  line 20 *q == 2
> gcc.dg/guality/pr54970.c   -O1  line 25 *p == 13
> gcc.dg/guality/pr54970.c   -O1  line 25 *q == 12
> gcc.dg/guality/pr54970.c   -O1  line 31 *p == 6
> gcc.dg/guality/pr54970.c   -O1  line 31 *q == 5
> gcc.dg/guality/pr54970.c   -O1  line 36 *p == 26
> gcc.dg/guality/pr54970.c   -O1  line 36 *q == 5
> gcc.dg/guality/pr54970.c   -O1  line 45 *p == 26
> gcc.dg/guality/pr54970.c   -O1  line 45 *q == 25
> gcc.dg/guality/pr54970.c   -O1  line 45 p[-1] == 25
> gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 q[1] == 26
> gcc.dg/guality/pr56154-1.c   -O1  line pr56154-1.c:20 x.a == 6
> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s1.f == 5.0
> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s1.g == 6.0
> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s2.f == 0.0
> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s2.g == 6.0
> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s1.f == 5.0
> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s1.g == 6.0
> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s2.f == 5.0
> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s2.g == 6.0
> gcc.dg/guality/sra-1.c   -O1  line 21 a.j == 14
> gcc.dg/guality/sra-1.c   -O1  line 32 a[1] == 14
> gcc.dg/guality/sra-1.c   -O1  line 43 a.i == 4
> gcc.dg/guality/sra-1.c   -O1  line 43 a.j == 14
> (and other optimization levels)
> On ppc64 bigendian, with the rest of the series (but I expect it's this patch):
> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
> (again, and other optimization levels).

Cheers, Alan

> ---
>   gcc/testsuite/gcc.dg/tree-ssa/sra-17.c |  19 ++++++
>   gcc/testsuite/gcc.dg/tree-ssa/sra-18.c |  28 +++++++++
>   gcc/tree-sra.c                         | 104 +++++++++++++++++++++++++++++++--
>   3 files changed, 147 insertions(+), 4 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
> new file mode 100644
> index 0000000..25667f4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
> @@ -0,0 +1,19 @@
> +/* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */
> +/* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */
> +
> +extern void abort (void);
> +
> +int
> +main (int argc, char **argv)
> +{
> +  long a[4] = { 7, 19, 11, 255 };
> +  int tot = 0;
> +  for (int i = 0; i < 4; i++)
> +    tot = (tot*256) + a[i];
> +  if (tot == 0x07130bff)
> +    return 0;
> +  abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1 "esra" } } */
> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\\[" 4 "esra" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
> new file mode 100644
> index 0000000..609fb11
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */
> +/* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */
> +
> +extern void abort (void);
> +struct foo { long x; };
> +
> +struct bar { struct foo f[2]; };
> +
> +struct baz { struct bar b[2]; };
> +
> +int
> +main (int argc, char **argv)
> +{
> +  struct baz a = { { { { { 4 }, { 5 } } }, { { { 6 }, { 7 } } }  } };
> +  int tot = 0;
> +  for (int i = 0; i < 2; i++)
> +    for (int j = 0; j < 2; j++)
> +      tot = (tot*256) + a.b[i].f[j].x;
> +  if (tot == 0x04050607)
> +    return 0;
> +  abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1 "esra" } } */
> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[0\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[0\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[1\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\.b\\\[1\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index c4fea5b..77a2e49 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -328,6 +328,10 @@ candidate (unsigned uid)
>      those which cannot be (because they are and need be used as a whole).  */
>   static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
>
> +/* Bitmap of candidates in the constant pool, which cannot be scalarized
> +   because this would produce non-constant expressions (e.g. Ada).  */
> +static bitmap disqualified_constants;
> +
>   /* Obstack for creation of fancy names.  */
>   static struct obstack name_obstack;
>
> @@ -652,6 +656,7 @@ sra_initialize (void)
>       (vec_safe_length (cfun->local_decls) / 2);
>     should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>     cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
> +  disqualified_constants = BITMAP_ALLOC (NULL);
>     gcc_obstack_init (&name_obstack);
>     base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>     memset (&sra_stats, 0, sizeof (sra_stats));
> @@ -670,6 +675,7 @@ sra_deinitialize (void)
>     candidates = NULL;
>     BITMAP_FREE (should_scalarize_away_bitmap);
>     BITMAP_FREE (cannot_scalarize_away_bitmap);
> +  BITMAP_FREE (disqualified_constants);
>     access_pool.release ();
>     assign_link_pool.release ();
>     obstack_free (&name_obstack, NULL);
> @@ -677,6 +683,13 @@ sra_deinitialize (void)
>     delete base_access_vec;
>   }
>
> +/* Return true if DECL is a VAR_DECL in the constant pool, false otherwise.  */
> +
> +static bool constant_decl_p (tree decl)
> +{
> +  return TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl);
> +}
> +
>   /* Remove DECL from candidates for SRA and write REASON to the dump file if
>      there is one.  */
>   static void
> @@ -684,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
>   {
>     if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
>       candidates->remove_elt_with_hash (decl, DECL_UID (decl));
> +  if (constant_decl_p (decl))
> +    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
>
>     if (dump_file && (dump_flags & TDF_DETAILS))
>       {
> @@ -835,8 +850,11 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
>     return access;
>   }
>
> +static bool maybe_add_sra_candidate (tree);
> +
>   /* Create and insert access for EXPR. Return created access, or NULL if it is
> -   not possible.  */
> +   not possible.  Also scan for uses of constant pool as we go along and add
> +   to candidates.  */
>
>   static struct access *
>   create_access (tree expr, gimple *stmt, bool write)
> @@ -859,6 +877,25 @@ create_access (tree expr, gimple *stmt, bool write)
>     else
>       ptr = false;
>
> +  /* For constant-pool entries, check we can substitute the constant value.  */
> +  if (constant_decl_p (base)
> +      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
> +    {
> +      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
> +      if (expr != base
> +	  && !is_gimple_reg_type (TREE_TYPE (expr))
> +	  && dump_file && (dump_flags & TDF_DETAILS))
> +	{
> +	  /* This occurs in Ada with accesses to ARRAY_RANGE_REFs,
> +	     and elements of multidimensional arrays (which are
> +	     multi-element arrays in their own right).  */
> +	  fprintf (dump_file, "Allowing non-reg-type load of part"
> +			      " of constant-pool entry: ");
> +	  print_generic_expr (dump_file, expr, 0);
> +	}
> +      maybe_add_sra_candidate (base);
> +    }
> +
>     if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>       return NULL;
>
> @@ -918,6 +955,8 @@ static bool
>   scalarizable_type_p (tree type)
>   {
>     gcc_assert (!is_gimple_reg_type (type));
> +  if (type_contains_placeholder_p (type))
> +    return false;
>
>     switch (TREE_CODE (type))
>     {
> @@ -1852,7 +1891,10 @@ maybe_add_sra_candidate (tree var)
>         reject (var, "not aggregate");
>         return false;
>       }
> -  if (needs_to_live_in_memory (var))
> +  /* Allow constant-pool entries (that "need to live in memory")
> +     unless we are doing IPA SRA.  */
> +  if (needs_to_live_in_memory (var)
> +      && (sra_mode == SRA_MODE_EARLY_IPA || !constant_decl_p (var)))
>       {
>         reject (var, "needs to live in memory");
>         return false;
> @@ -3113,7 +3155,7 @@ load_assign_lhs_subreplacements (struct access *lacc,
>
>   	      if (racc && racc->grp_to_be_replaced)
>   		{
> -		  if (racc->grp_write)
> +		  if (racc->grp_write || constant_decl_p (racc->base))
>   		    drhs = get_access_replacement (racc);
>   		  else
>   		    drhs = NULL;
> @@ -3272,6 +3314,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>     racc = get_access_for_expr (rhs);
>     if (!lacc && !racc)
>       return SRA_AM_NONE;
> +  /* Avoid modifying initializations of constant-pool replacements.  */
> +  if (racc && (racc->replacement_decl == lhs))
> +    return SRA_AM_NONE;
>
>     loc = gimple_location (stmt);
>     if (lacc && lacc->grp_to_be_replaced)
> @@ -3388,7 +3433,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>         || contains_vce_or_bfcref_p (lhs)
>         || stmt_ends_bb_p (stmt))
>       {
> -      if (access_has_children_p (racc))
> +      /* No need to copy into a constant-pool, it comes pre-initialized.  */
> +      if (access_has_children_p (racc) && !constant_decl_p (racc->base))
>   	generate_subtree_copies (racc->first_child, rhs, racc->offset, 0, 0,
>   				 gsi, false, false, loc);
>         if (access_has_children_p (lacc))
> @@ -3491,6 +3537,54 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>       }
>   }
>
> +/* Set any scalar replacements of values in the constant pool to the initial
> +   value of the constant.  (Constant-pool decls like *.LC0 have effectively
> +   been initialized before the program starts, we must do the same for their
> +   replacements.)  Thus, we output statements like 'SR.1 = *.LC0[0];' into
> +   the function's entry block.  */
> +
> +static void
> +initialize_constant_pool_replacements (void)
> +{
> +  gimple_seq seq = NULL;
> +  gimple_stmt_iterator gsi = gsi_start (seq);
> +  bitmap_iterator bi;
> +  unsigned i;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
> +    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
> +	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
> +      {
> +	tree var = candidate (i);
> +	if (!constant_decl_p (var))
> +	  continue;
> +	vec<access_p> *access_vec = get_base_access_vector (var);
> +	if (!access_vec)
> +	  continue;
> +	for (unsigned i = 0; i < access_vec->length (); i++)
> +	  {
> +	    struct access *access = (*access_vec)[i];
> +	    if (!access->replacement_decl)
> +	      continue;
> +	    gassign *stmt = gimple_build_assign (
> +	      get_access_replacement (access), unshare_expr (access->expr));
> +	    if (dump_file && (dump_flags & TDF_DETAILS))
> +	      {
> +		fprintf (dump_file, "Generating constant initializer: ");
> +		print_gimple_stmt (dump_file, stmt, 0, 1);
> +		fprintf (dump_file, "\n");
> +	      }
> +	    gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> +	    update_stmt (stmt);
> +	  }
> +      }
> +
> +  seq = gsi_seq (gsi);
> +  if (seq)
> +    gsi_insert_seq_on_edge_immediate (
> +      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
> +}
> +
>   /* Traverse the function body and all modifications as decided in
>      analyze_all_variable_accesses.  Return true iff the CFG has been
>      changed.  */
> @@ -3501,6 +3595,8 @@ sra_modify_function_body (void)
>     bool cfg_changed = false;
>     basic_block bb;
>
> +  initialize_constant_pool_replacements ();
> +
>     FOR_EACH_BB_FN (bb, cfun)
>       {
>         gimple_stmt_iterator gsi = gsi_start_bb (bb);
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs.
  2016-01-15 10:07           ` Richard Biener
@ 2016-01-15 10:38             ` Alan Lawrence
  2016-01-15 10:41               ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2016-01-15 10:38 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 15/01/16 10:07, Richard Biener wrote:
> On Fri, Jan 15, 2016 at 10:47 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> On Thu, Jan 14, 2016 at 12:30 PM, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>>>
>>> The vuse test is not necessary
>>>
>>>> +              && (gimple_assign_rhs_code (def) == SSA_NAME
>>>> +                  || is_gimple_min_invariant (gimple_assign_rhs1 (def))))
>>>
>>> and the is_gimple_min_invariant (rhs1) test is not sufficient if you
>>> consider - (-INT_MIN) with -ftrapv for example.
>>
>> Thanks, I didn't realize gimple_min_invariant would allow such cases.
>
> Well, the invariant would be -INT_MIN but gimple_assign_rhs_code (def) would
> be NEGATE_EXPR.  Basically you forgot about unary operators.

Hmm, shouldn't those have get_gimple_rhs_class(gimple_assign_rhs_code(stmt)) == 
GIMPLE_UNARY_RHS, rather than GIMPLE_SINGLE_RHS as checked for by 
gimple_assign_single_p?

If SINGLE_RHS includes unary operators, the new version of the patch is as 
flawed as the previous, in that it drops the unary operator altogether.

--Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs.
  2016-01-15 10:38             ` Alan Lawrence
@ 2016-01-15 10:41               ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2016-01-15 10:41 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Fri, Jan 15, 2016 at 11:38 AM, Alan Lawrence
<alan.lawrence@foss.arm.com> wrote:
> On 15/01/16 10:07, Richard Biener wrote:
>>
>> On Fri, Jan 15, 2016 at 10:47 AM, Alan Lawrence <alan.lawrence@arm.com>
>> wrote:
>>>
>>> On Thu, Jan 14, 2016 at 12:30 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>
>>>>
>>>> The vuse test is not necessary
>>>>
>>>>> +              && (gimple_assign_rhs_code (def) == SSA_NAME
>>>>> +                  || is_gimple_min_invariant (gimple_assign_rhs1
>>>>> (def))))
>>>>
>>>>
>>>> and the is_gimple_min_invariant (rhs1) test is not sufficient if you
>>>> consider - (-INT_MIN) with -ftrapv for example.
>>>
>>>
>>> Thanks, I didn't realize gimple_min_invariant would allow such cases.
>>
>>
>> Well, the invariant would be -INT_MIN but gimple_assign_rhs_code (def)
>> would
>> be NEGATE_EXPR.  Basically you forgot about unary operators.
>
>
> Hmm, shouldn't those have get_gimple_rhs_class(gimple_assign_rhs_code(stmt))
> == GIMPLE_UNARY_RHS, rather than GIMPLE_SINGLE_RHS as checked for by
> gimple_assign_single_p?

Doh, of course.

> If SINGLE_RHS includes unary operators, the new version of the patch is as
> flawed as the previous, in that it drops the unary operator altogether.
>
> --Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 15:33         ` Bill Schmidt
  2015-12-22 16:00           ` Alan Lawrence
@ 2016-01-15 10:46           ` Alan Lawrence
  2016-01-15 10:50             ` Richard Biener
  1 sibling, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2016-01-15 10:46 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bill Schmidt

It seems the conclusion on PowerPC is to XFAIL the test on powerpc64 (there will 
be XPASSes with -mcpu=power7 or -mcpu=power8). Which is what the original patch 
does (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01979.html). So,

Ping.

Thanks, Alan

On 21/12/15 15:33, Bill Schmidt wrote:
> On Mon, 2015-12-21 at 15:22 +0000, Alan Lawrence wrote:
>> On 21/12/15 14:59, Bill Schmidt wrote:
>>>>>
>>>>> On powerpc64, the test passes with -mcpu=power8 (the loop is vectorized as a
>>>>> reduction); however, without that, similar code is generated to Alpha (the
>>>>> vectorizer decides the reduction is not worthwhile without SIMD support), and
>>>>> the test fails; hence, I've XFAILed for powerpc, but I think I could condition
>>>>> the XFAIL on powerpc64 && !check_p8vector_hw_available, if preferred?
>>>>
>>>> Fun.
>>>>
>>>> Does it work with -mcpu=power7?
>>
>> Yes, it works with -mcpu=power7, as well as -mcpu=power8, but not e.g. -mcpu=power6.
>>
>>>> Bill: What GCC DejaGNU incantation would you like to see?
>>>
>>> This sounds like more fallout from unaligned accesses being faster on
>>> POWER8 than previous hardware.  What about conditioning the XFAIL on
>>>
>>> { powerpc*-*-* && { ! vect_hw_misalign } }
>>>
>>> -- does this work properly?
>>
>> Not on a stage1 compiler - check_p8vector_hw_available itself requires being
>> able to run executables - I'll check on gcc112. However, both look like they're
>> really about the host (ability to execute an asm instruction), not the target
>> (/ability for gcc to output such an instruction)....
>
> Hm, that looks like a pervasive problem for powerpc.  There are a number
> of things that are supposed to be testing effective target but rely on
> check_p8vector_hw_available, which as you note requires executing an
> instruction and is really about the host.  We need to clean that up; I
> should probably open a bug.  Kind of amazed this has gotten past us for
> a couple of years.
>
> For now, just XFAILing for powerpc seems the best alternative for this
> test.
>
> Thanks,
> Bill
>
>>
>> --Alan
>>
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2016-01-15 10:46           ` Alan Lawrence
@ 2016-01-15 10:50             ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2016-01-15 10:50 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches, Bill Schmidt

On Fri, Jan 15, 2016 at 11:46 AM, Alan Lawrence
<alan.lawrence@foss.arm.com> wrote:
> It seems the conclusion on PowerPC is to XFAIL the test on powerpc64 (there
> will be XPASSes with -mcpu=power7 or -mcpu=power8). Which is what the
> original patch does
> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01979.html). So,
>
> Ping.

Ok.

Richard.

> Thanks, Alan
>
> On 21/12/15 15:33, Bill Schmidt wrote:
>>
>> On Mon, 2015-12-21 at 15:22 +0000, Alan Lawrence wrote:
>>>
>>> On 21/12/15 14:59, Bill Schmidt wrote:
>>>>>>
>>>>>>
>>>>>> On powerpc64, the test passes with -mcpu=power8 (the loop is
>>>>>> vectorized as a
>>>>>> reduction); however, without that, similar code is generated to Alpha
>>>>>> (the
>>>>>> vectorizer decides the reduction is not worthwhile without SIMD
>>>>>> support), and
>>>>>> the test fails; hence, I've XFAILed for powerpc, but I think I could
>>>>>> condition
>>>>>> the XFAIL on powerpc64 && !check_p8vector_hw_available, if preferred?
>>>>>
>>>>>
>>>>> Fun.
>>>>>
>>>>> Does it work with -mcpu=power7?
>>>
>>>
>>> Yes, it works with -mcpu=power7, as well as -mcpu=power8, but not e.g.
>>> -mcpu=power6.
>>>
>>>>> Bill: What GCC DejaGNU incantation would you like to see?
>>>>
>>>>
>>>> This sounds like more fallout from unaligned accesses being faster on
>>>> POWER8 than previous hardware.  What about conditioning the XFAIL on
>>>>
>>>> { powerpc*-*-* && { ! vect_hw_misalign } }
>>>>
>>>> -- does this work properly?
>>>
>>>
>>> Not on a stage1 compiler - check_p8vector_hw_available itself requires
>>> being
>>> able to run executables - I'll check on gcc112. However, both look like
>>> they're
>>> really about the host (ability to execute an asm instruction), not the
>>> target
>>> (/ability for gcc to output such an instruction)....
>>
>>
>> Hm, that looks like a pervasive problem for powerpc.  There are a number
>> of things that are supposed to be testing effective target but rely on
>> check_p8vector_hw_available, which as you note requires executing an
>> instruction and is really about the host.  We need to clean that up; I
>> should probably open a bug.  Kind of amazed this has gotten past us for
>> a couple of years.
>>
>> For now, just XFAILing for powerpc seems the best alternative for this
>> test.
>>
>> Thanks,
>> Bill
>>
>>>
>>> --Alan
>>>
>>
>>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/4] Make SRA scalarize constant-pool loads
  2016-01-15 10:27     ` Alan Lawrence
@ 2016-01-18 11:04       ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2016-01-18 11:04 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Fri, Jan 15, 2016 at 11:27 AM, Alan Lawrence
<alan.lawrence@foss.arm.com> wrote:
> On 24/12/15 11:53, Alan Lawrence wrote:
>>
>> Here's a new version that fixes the gcc.dg/guality/pr54970.c failures seen
>> on
>> aarch64 and powerpc64. Prior to SRA handling constant pool decls,
>> -fdump-tree-esra-details (at -O1 -g) had shown:
>>    <bb 2>:
>>    a = *.LC0;
>>    # DEBUG a$0 => MEM[(int[3] *)&*.LC0]
>>    a$4_3 = MEM[(int[3] *)&*.LC0 + 4B];
>>    # DEBUG a$4 => a$4_3
>>    a$8_4 = MEM[(int[3] *)&*.LC0 + 8B];
>>
>> The previous patch changed this to:
>>    <bb 2>:
>>    SR.5_3 = *.LC0[0];
>>    SR.6_4 = *.LC0[1];
>>    SR.7_19 = *.LC0[2];
>>    SR.8_21 = *.LC1[0];
>>    SR.9_22 = *.LC1[1];
>>    SR.10_23 = *.LC1[2];
>>    # DEBUG a$0 => NULL   // Note here
>>    a$4_24 = SR.6_4;
>>    # DEBUG a$4 => a$4_24
>>    a$8_25 = SR.7_19;
>>
>> Turns out the DEBUG a$0 => NULL was produced in
>> load_assign_lhs_subreplacements:
>>
>>           if (lacc && lacc->grp_to_be_debug_replaced)
>>             {
>>               gdebug *ds;
>>               tree drhs;
>>               struct access *racc = find_access_in_subtree (sad->top_racc,
>>                                                             offset,
>>                                                             lacc->size);
>>
>>               if (racc && racc->grp_to_be_replaced)
>>                 {
>>                   if (racc->grp_write)
>>                     drhs = get_access_replacement (racc);
>>                   else
>>                     drhs = NULL;  // <=== HERE
>>                 }
>> ...
>>               ds = gimple_build_debug_bind (get_access_replacement (lacc),
>>                                             drhs, gsi_stmt
>> (sad->old_gsi));
>>
>> Prior to the patch, we'd skipped around load_assign_lhs_subreplacements,
>> because
>> access_has_children_p(racc) (for racc = *.LC0) didn't hold in
>> sra_modify_assign.
>>
>> I also added a constant_decl_p function, combining the two checks, plus
>> some
>> testcase fixes.
>>
>> Bootstrapped + check-gcc,g++ on x86_64, ARM, AArch64,
>>    also on powerpc64{,le}-none-linux-gnu *in combination with the other
>> patches
>>    in the series* (I haven't tested the individual patches on PPC),
>>    plus Ada on ARM and x86_64.
>>
>> gcc/ChangeLog:
>>
>>         PR target/63679
>>         * tree-sra.c (disqualified_constants, constant_decl_p): New.
>>         (sra_initialize): Allocate disqualified_constants.
>>         (sra_deinitialize): Free disqualified_constants.
>>         (disqualify_candidate): Update disqualified_constants when
>> appropriate.
>>         (create_access): Scan for constant-pool entries as we go along.
>>         (scalarizable_type_p): Add check against
>> type_contains_placeholder_p.
>>         (maybe_add_sra_candidate): Allow constant-pool entries.
>>         (load_assign_lhs_subreplacements): Bind debug for constant pool
>> vars.
>>         (initialize_constant_pool_replacements): New.
>>         (sra_modify_assign): Avoid mangling assignments created by
>> previous,
>>         and don't generate writes into constant pool.
>>         (sra_modify_function_body): Call
>> initialize_constant_pool_replacements.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.dg/tree-ssa/sra-17.c: New.
>>         * gcc.dg/tree-ssa/sra-18.c: New.
>
>
> Ping.

Ok.

Thanks,
Richard.

> (The next bit is false, unless you force SRA to happen more widely, but all
> the above stands)
>
>> This also fixes a bunch of other guality tests on AArch64 that were
>> failing
>> prior to the patch series, and another bunch on PowerPC64 (bigendian
>> -m32), listed below.
>
>>
>>
>> Tests fixed on aarch64-none-linux-gnu:
>>
>> gcc.dg/guality/pr54970.c   -O1  line 15 *p == 3
>> gcc.dg/guality/pr54970.c   -O1  line 15 *q == 2
>> gcc.dg/guality/pr54970.c   -O1  line 20 *p == 13
>> gcc.dg/guality/pr54970.c   -O1  line 20 *q == 2
>> gcc.dg/guality/pr54970.c   -O1  line 25 *p == 13
>> gcc.dg/guality/pr54970.c   -O1  line 25 *q == 12
>> gcc.dg/guality/pr54970.c   -O1  line 31 *p == 6
>> gcc.dg/guality/pr54970.c   -O1  line 31 *q == 5
>> gcc.dg/guality/pr54970.c   -O1  line 36 *p == 26
>> gcc.dg/guality/pr54970.c   -O1  line 36 *q == 5
>> gcc.dg/guality/pr54970.c   -O1  line 45 *p == 26
>> gcc.dg/guality/pr54970.c   -O1  line 45 *q == 25
>> gcc.dg/guality/pr54970.c   -O1  line 45 p[-1] == 25
>> gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
>> gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
>> gcc.dg/guality/pr54970.c   -O1  line 45 q[1] == 26
>> gcc.dg/guality/pr56154-1.c   -O1  line pr56154-1.c:20 x.a == 6
>> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s1.f == 5.0
>> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s1.g == 6.0
>> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s2.f == 0.0
>> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:17 s2.g == 6.0
>> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s1.f == 5.0
>> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s1.g == 6.0
>> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s2.f == 5.0
>> gcc.dg/guality/pr59776.c   -O1  line pr59776.c:20 s2.g == 6.0
>> gcc.dg/guality/sra-1.c   -O1  line 21 a.j == 14
>> gcc.dg/guality/sra-1.c   -O1  line 32 a[1] == 14
>> gcc.dg/guality/sra-1.c   -O1  line 43 a.i == 4
>> gcc.dg/guality/sra-1.c   -O1  line 43 a.j == 14
>> (and other optimization levels)
>> On ppc64 bigendian, with the rest of the series (but I expect it's this
>> patch):
>> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
>> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
>> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
>> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
>> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
>> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
>> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
>> unix/-m32: gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
>> (again, and other optimization levels).
>
>
> Cheers, Alan
>
>
>> ---
>>   gcc/testsuite/gcc.dg/tree-ssa/sra-17.c |  19 ++++++
>>   gcc/testsuite/gcc.dg/tree-ssa/sra-18.c |  28 +++++++++
>>   gcc/tree-sra.c                         | 104
>> +++++++++++++++++++++++++++++++--
>>   3 files changed, 147 insertions(+), 4 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
>> new file mode 100644
>> index 0000000..25667f4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-*
>> powerpc*-*-* s390*-*-* } } } */
>> +/* { dg-options "-O2 -fdump-tree-esra --param
>> sra-max-scalarization-size-Ospeed=32" } */
>> +
>> +extern void abort (void);
>> +
>> +int
>> +main (int argc, char **argv)
>> +{
>> +  long a[4] = { 7, 19, 11, 255 };
>> +  int tot = 0;
>> +  for (int i = 0; i < 4; i++)
>> +    tot = (tot*256) + a[i];
>> +  if (tot == 0x07130bff)
>> +    return 0;
>> +  abort ();
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1
>> "esra" } } */
>> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ = \\\*.LC0\\\[" 4
>> "esra" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
>> new file mode 100644
>> index 0000000..609fb11
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c
>> @@ -0,0 +1,28 @@
>> +/* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-*
>> powerpc*-*-* s390*-*-* } } } */
>> +/* { dg-options "-O2 -fdump-tree-esra --param
>> sra-max-scalarization-size-Ospeed=32" } */
>> +
>> +extern void abort (void);
>> +struct foo { long x; };
>> +
>> +struct bar { struct foo f[2]; };
>> +
>> +struct baz { struct bar b[2]; };
>> +
>> +int
>> +main (int argc, char **argv)
>> +{
>> +  struct baz a = { { { { { 4 }, { 5 } } }, { { { 6 }, { 7 } } }  } };
>> +  int tot = 0;
>> +  for (int i = 0; i < 2; i++)
>> +    for (int j = 0; j < 2; j++)
>> +      tot = (tot*256) + a.b[i].f[j].x;
>> +  if (tot == 0x04050607)
>> +    return 0;
>> +  abort ();
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "Removing load: a = \\\*.LC0;" 1
>> "esra" } } */
>> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
>> \\\*.LC0\\.b\\\[0\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
>> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
>> \\\*.LC0\\.b\\\[0\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
>> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
>> \\\*.LC0\\.b\\\[1\\\]\\.f\\\[0\\\]\\.x" 1 "esra" } } */
>> +/* { dg-final { scan-tree-dump-times "SR.\[0-9_\]+ =
>> \\\*.LC0\\.b\\\[1\\\]\\.f\\\[1\\\]\\.x" 1 "esra" } } */
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index c4fea5b..77a2e49 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -328,6 +328,10 @@ candidate (unsigned uid)
>>      those which cannot be (because they are and need be used as a whole).
>> */
>>   static bitmap should_scalarize_away_bitmap,
>> cannot_scalarize_away_bitmap;
>>
>> +/* Bitmap of candidates in the constant pool, which cannot be scalarized
>> +   because this would produce non-constant expressions (e.g. Ada).  */
>> +static bitmap disqualified_constants;
>> +
>>   /* Obstack for creation of fancy names.  */
>>   static struct obstack name_obstack;
>>
>> @@ -652,6 +656,7 @@ sra_initialize (void)
>>       (vec_safe_length (cfun->local_decls) / 2);
>>     should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>>     cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>> +  disqualified_constants = BITMAP_ALLOC (NULL);
>>     gcc_obstack_init (&name_obstack);
>>     base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>>     memset (&sra_stats, 0, sizeof (sra_stats));
>> @@ -670,6 +675,7 @@ sra_deinitialize (void)
>>     candidates = NULL;
>>     BITMAP_FREE (should_scalarize_away_bitmap);
>>     BITMAP_FREE (cannot_scalarize_away_bitmap);
>> +  BITMAP_FREE (disqualified_constants);
>>     access_pool.release ();
>>     assign_link_pool.release ();
>>     obstack_free (&name_obstack, NULL);
>> @@ -677,6 +683,13 @@ sra_deinitialize (void)
>>     delete base_access_vec;
>>   }
>>
>> +/* Return true if DECL is a VAR_DECL in the constant pool, false
>> otherwise.  */
>> +
>> +static bool constant_decl_p (tree decl)
>> +{
>> +  return TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl);
>> +}
>> +
>>   /* Remove DECL from candidates for SRA and write REASON to the dump file
>> if
>>      there is one.  */
>>   static void
>> @@ -684,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
>>   {
>>     if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
>>       candidates->remove_elt_with_hash (decl, DECL_UID (decl));
>> +  if (constant_decl_p (decl))
>> +    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
>>
>>     if (dump_file && (dump_flags & TDF_DETAILS))
>>       {
>> @@ -835,8 +850,11 @@ create_access_1 (tree base, HOST_WIDE_INT offset,
>> HOST_WIDE_INT size)
>>     return access;
>>   }
>>
>> +static bool maybe_add_sra_candidate (tree);
>> +
>>   /* Create and insert access for EXPR. Return created access, or NULL if
>> it is
>> -   not possible.  */
>> +   not possible.  Also scan for uses of constant pool as we go along and
>> add
>> +   to candidates.  */
>>
>>   static struct access *
>>   create_access (tree expr, gimple *stmt, bool write)
>> @@ -859,6 +877,25 @@ create_access (tree expr, gimple *stmt, bool write)
>>     else
>>       ptr = false;
>>
>> +  /* For constant-pool entries, check we can substitute the constant
>> value.  */
>> +  if (constant_decl_p (base)
>> +      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode ==
>> SRA_MODE_INTRA))
>> +    {
>> +      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID
>> (base)));
>> +      if (expr != base
>> +         && !is_gimple_reg_type (TREE_TYPE (expr))
>> +         && dump_file && (dump_flags & TDF_DETAILS))
>> +       {
>> +         /* This occurs in Ada with accesses to ARRAY_RANGE_REFs,
>> +            and elements of multidimensional arrays (which are
>> +            multi-element arrays in their own right).  */
>> +         fprintf (dump_file, "Allowing non-reg-type load of part"
>> +                             " of constant-pool entry: ");
>> +         print_generic_expr (dump_file, expr, 0);
>> +       }
>> +      maybe_add_sra_candidate (base);
>> +    }
>> +
>>     if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID
>> (base)))
>>       return NULL;
>>
>> @@ -918,6 +955,8 @@ static bool
>>   scalarizable_type_p (tree type)
>>   {
>>     gcc_assert (!is_gimple_reg_type (type));
>> +  if (type_contains_placeholder_p (type))
>> +    return false;
>>
>>     switch (TREE_CODE (type))
>>     {
>> @@ -1852,7 +1891,10 @@ maybe_add_sra_candidate (tree var)
>>         reject (var, "not aggregate");
>>         return false;
>>       }
>> -  if (needs_to_live_in_memory (var))
>> +  /* Allow constant-pool entries (that "need to live in memory")
>> +     unless we are doing IPA SRA.  */
>> +  if (needs_to_live_in_memory (var)
>> +      && (sra_mode == SRA_MODE_EARLY_IPA || !constant_decl_p (var)))
>>       {
>>         reject (var, "needs to live in memory");
>>         return false;
>> @@ -3113,7 +3155,7 @@ load_assign_lhs_subreplacements (struct access
>> *lacc,
>>
>>               if (racc && racc->grp_to_be_replaced)
>>                 {
>> -                 if (racc->grp_write)
>> +                 if (racc->grp_write || constant_decl_p (racc->base))
>>                     drhs = get_access_replacement (racc);
>>                   else
>>                     drhs = NULL;
>> @@ -3272,6 +3314,9 @@ sra_modify_assign (gimple *stmt,
>> gimple_stmt_iterator *gsi)
>>     racc = get_access_for_expr (rhs);
>>     if (!lacc && !racc)
>>       return SRA_AM_NONE;
>> +  /* Avoid modifying initializations of constant-pool replacements.  */
>> +  if (racc && (racc->replacement_decl == lhs))
>> +    return SRA_AM_NONE;
>>
>>     loc = gimple_location (stmt);
>>     if (lacc && lacc->grp_to_be_replaced)
>> @@ -3388,7 +3433,8 @@ sra_modify_assign (gimple *stmt,
>> gimple_stmt_iterator *gsi)
>>         || contains_vce_or_bfcref_p (lhs)
>>         || stmt_ends_bb_p (stmt))
>>       {
>> -      if (access_has_children_p (racc))
>> +      /* No need to copy into a constant-pool, it comes pre-initialized.
>> */
>> +      if (access_has_children_p (racc) && !constant_decl_p (racc->base))
>>         generate_subtree_copies (racc->first_child, rhs, racc->offset, 0,
>> 0,
>>                                  gsi, false, false, loc);
>>         if (access_has_children_p (lacc))
>> @@ -3491,6 +3537,54 @@ sra_modify_assign (gimple *stmt,
>> gimple_stmt_iterator *gsi)
>>       }
>>   }
>>
>> +/* Set any scalar replacements of values in the constant pool to the
>> initial
>> +   value of the constant.  (Constant-pool decls like *.LC0 have
>> effectively
>> +   been initialized before the program starts, we must do the same for
>> their
>> +   replacements.)  Thus, we output statements like 'SR.1 = *.LC0[0];'
>> into
>> +   the function's entry block.  */
>> +
>> +static void
>> +initialize_constant_pool_replacements (void)
>> +{
>> +  gimple_seq seq = NULL;
>> +  gimple_stmt_iterator gsi = gsi_start (seq);
>> +  bitmap_iterator bi;
>> +  unsigned i;
>> +
>> +  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
>> +    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
>> +       && !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
>> +      {
>> +       tree var = candidate (i);
>> +       if (!constant_decl_p (var))
>> +         continue;
>> +       vec<access_p> *access_vec = get_base_access_vector (var);
>> +       if (!access_vec)
>> +         continue;
>> +       for (unsigned i = 0; i < access_vec->length (); i++)
>> +         {
>> +           struct access *access = (*access_vec)[i];
>> +           if (!access->replacement_decl)
>> +             continue;
>> +           gassign *stmt = gimple_build_assign (
>> +             get_access_replacement (access), unshare_expr
>> (access->expr));
>> +           if (dump_file && (dump_flags & TDF_DETAILS))
>> +             {
>> +               fprintf (dump_file, "Generating constant initializer: ");
>> +               print_gimple_stmt (dump_file, stmt, 0, 1);
>> +               fprintf (dump_file, "\n");
>> +             }
>> +           gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
>> +           update_stmt (stmt);
>> +         }
>> +      }
>> +
>> +  seq = gsi_seq (gsi);
>> +  if (seq)
>> +    gsi_insert_seq_on_edge_immediate (
>> +      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
>> +}
>> +
>>   /* Traverse the function body and all modifications as decided in
>>      analyze_all_variable_accesses.  Return true iff the CFG has been
>>      changed.  */
>> @@ -3501,6 +3595,8 @@ sra_modify_function_body (void)
>>     bool cfg_changed = false;
>>     basic_block bb;
>>
>> +  initialize_constant_pool_replacements ();
>> +
>>     FOR_EACH_BB_FN (bb, cfun)
>>       {
>>         gimple_stmt_iterator gsi = gsi_start_bb (bb);
>>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2015-12-24 11:55   ` Alan Lawrence
  2016-01-04 19:05     ` Jeff Law
@ 2016-01-19  3:05     ` H.J. Lu
  2016-01-19  9:46       ` Christophe Lyon
  1 sibling, 1 reply; 48+ messages in thread
From: H.J. Lu @ 2016-01-19  3:05 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On Thu, Dec 24, 2015 at 3:55 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This version changes the test cases to fix failures on some platforms, by
> rewriting the initializers so that they aren't pushed out to the constant pool.
>
> gcc/ChangeLog:
>
>         * tree-ssa-scopedtables.c (avail_expr_hash): Hash MEM_REF and ARRAY_REF
>         using get_ref_base_and_extent.
>         (equal_mem_array_ref_p): New.
>         (hashable_expr_equal_p): Add call to previous.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69352

-- 
H.J.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2016-01-19  3:05     ` H.J. Lu
@ 2016-01-19  9:46       ` Christophe Lyon
  2016-01-19 13:22         ` Alan Lawrence
  0 siblings, 1 reply; 48+ messages in thread
From: Christophe Lyon @ 2016-01-19  9:46 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On 19 January 2016 at 04:05, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Dec 24, 2015 at 3:55 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> This version changes the test cases to fix failures on some platforms, by
>> rewriting the initializers so that they aren't pushed out to the constant pool.
>>
>> gcc/ChangeLog:
>>
>>         * tree-ssa-scopedtables.c (avail_expr_hash): Hash MEM_REF and ARRAY_REF
>>         using get_ref_base_and_extent.
>>         (equal_mem_array_ref_p): New.
>>         (hashable_expr_equal_p): Add call to previous.
>>
>
> This caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69352
>

Hi Alan,

This patch also caused regressions on arm-none-linux-gnueabihf
with GCC configured as:
--with-thumb --with-cpu=cortex-a57 --with-fpu=crypto-neon-fp-armv8

These tests now fail:
gcc.dg/torture/pr61742.c   -O2  (test for excess errors)
gcc.dg/torture/pr61742.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  (test for excess errors)
gcc.dg/torture/pr61742.c   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
gcc.dg/torture/pr61742.c   -O3 -g  (test for excess errors)

Christophe

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2016-01-19  9:46       ` Christophe Lyon
@ 2016-01-19 13:22         ` Alan Lawrence
  2016-01-19 13:33           ` Christophe Lyon
  2016-03-01  4:15           ` hf not implying hf (was: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c) Hans-Peter Nilsson
  0 siblings, 2 replies; 48+ messages in thread
From: Alan Lawrence @ 2016-01-19 13:22 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: GCC Patches

On 19/01/16 09:46, Christophe Lyon wrote:
> On 19 January 2016 at 04:05, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Dec 24, 2015 at 3:55 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>> This version changes the test cases to fix failures on some platforms, by
>>> rewriting the initializers so that they aren't pushed out to the constant pool.
>>>
>>> gcc/ChangeLog:
>>>
>>>          * tree-ssa-scopedtables.c (avail_expr_hash): Hash MEM_REF and ARRAY_REF
>>>          using get_ref_base_and_extent.
>>>          (equal_mem_array_ref_p): New.
>>>          (hashable_expr_equal_p): Add call to previous.
>>>
>>
>> This caused:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69352
>>
>
> Hi Alan,
>
> This patch also caused regressions on arm-none-linux-gnueabihf
> with GCC configured as:
> --with-thumb --with-cpu=cortex-a57 --with-fpu=crypto-neon-fp-armv8
>
> These tests now fail:
> gcc.dg/torture/pr61742.c   -O2  (test for excess errors)
> gcc.dg/torture/pr61742.c   -O2 -flto -fno-use-linker-plugin
> -flto-partition=none  (test for excess errors)
> gcc.dg/torture/pr61742.c   -O3 -fomit-frame-pointer -funroll-loops
> -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> gcc.dg/torture/pr61742.c   -O3 -g  (test for excess errors)
>

Hmm, I still see these passing, both natively on arm-none-linux-gnueabihf and 
with a cross-build. hf implies --with-float=hard, right? Do you see what the 
error messages are?

Thanks, Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c
  2016-01-19 13:22         ` Alan Lawrence
@ 2016-01-19 13:33           ` Christophe Lyon
  2016-03-01  4:15           ` hf not implying hf (was: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c) Hans-Peter Nilsson
  1 sibling, 0 replies; 48+ messages in thread
From: Christophe Lyon @ 2016-01-19 13:33 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: GCC Patches

On 19 January 2016 at 14:22, Alan Lawrence <alan.lawrence@foss.arm.com> wrote:
> On 19/01/16 09:46, Christophe Lyon wrote:
>>
>> On 19 January 2016 at 04:05, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>
>>> On Thu, Dec 24, 2015 at 3:55 AM, Alan Lawrence <alan.lawrence@arm.com>
>>> wrote:
>>>>
>>>> This version changes the test cases to fix failures on some platforms,
>>>> by
>>>> rewriting the initializers so that they aren't pushed out to the
>>>> constant pool.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>          * tree-ssa-scopedtables.c (avail_expr_hash): Hash MEM_REF and
>>>> ARRAY_REF
>>>>          using get_ref_base_and_extent.
>>>>          (equal_mem_array_ref_p): New.
>>>>          (hashable_expr_equal_p): Add call to previous.
>>>>
>>>
>>> This caused:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69352
>>>
>>
>> Hi Alan,
>>
>> This patch also caused regressions on arm-none-linux-gnueabihf
>> with GCC configured as:
>> --with-thumb --with-cpu=cortex-a57 --with-fpu=crypto-neon-fp-armv8
>>
>> These tests now fail:
>> gcc.dg/torture/pr61742.c   -O2  (test for excess errors)
>> gcc.dg/torture/pr61742.c   -O2 -flto -fno-use-linker-plugin
>> -flto-partition=none  (test for excess errors)
>> gcc.dg/torture/pr61742.c   -O3 -fomit-frame-pointer -funroll-loops
>> -fpeel-loops -ftracer -finline-functions  (test for excess errors)
>> gcc.dg/torture/pr61742.c   -O3 -g  (test for excess errors)
>>
>
> Hmm, I still see these passing, both natively on arm-none-linux-gnueabihf
> and with a cross-build. hf implies --with-float=hard, right? Do you see what
> the error messages are?
>

Ha! gas complains that "IT blocks containing 32-bit Thumb instructions
are deprecated in ARMv8"

This is PR67591:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67591

So, not related to your patch in fact. Sorry for the noise.

Christophe.

> Thanks, Alan
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2015-12-21 13:22 ` [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms Alan Lawrence
  2015-12-21 14:10   ` David Edelsohn
@ 2016-01-26 12:23   ` Dominik Vogt
  2016-02-03 11:41     ` Alan Lawrence
  1 sibling, 1 reply; 48+ messages in thread
From: Dominik Vogt @ 2016-01-26 12:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: Alan Lawrence, David Edelsohn, Andreas Krebbel

On Mon, Dec 21, 2015 at 01:13:28PM +0000, Alan Lawrence wrote:
> ...the test passes with --param sra-max-scalarization-size-Ospeed.
> 
> Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc, s390.

How did you test this on s390?  For me, the test still fails
unless I add -march=z13 (s390x).

> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove XFAIL for powerpc(32), hppa,
> 	aarch64, sparc, s390. Add --param sra-max-scalarization-size-Ospeed.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> index 9eccdc9..748448e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
> +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */
>  
>  int
>  foo ()
> @@ -17,7 +17,8 @@ foo ()
>  /* After late unrolling the above loop completely DOM should be
>     able to optimize this to return 28.  */
>  
> -/* See PR63679 and PR64159, if the target forces the initializer to memory then
> -   DOM is not able to perform this optimization.  */
> +/* On alpha, the vectorizer generates writes of two vector elements at once,
> +   but the loop reads only one element at a time, and DOM cannot resolve these.
> +   The same happens on powerpc depending on the SIMD support available.  */
>  
> -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
> +/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail alpha*-*-* powerpc64*-*-* } } } */
> -- 
> 1.9.1


Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2016-01-26 12:23   ` Dominik Vogt
@ 2016-02-03 11:41     ` Alan Lawrence
  2016-02-04  9:53       ` Dominik Vogt
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2016-02-03 11:41 UTC (permalink / raw)
  To: vogt, gcc-patches, David Edelsohn, Andreas Krebbel

On 26/01/16 12:23, Dominik Vogt wrote:
> On Mon, Dec 21, 2015 at 01:13:28PM +0000, Alan Lawrence wrote:
>> ...the test passes with --param sra-max-scalarization-size-Ospeed.
>>
>> Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc, s390.
>
> How did you test this on s390?  For me, the test still fails
> unless I add -march=z13 (s390x).

Sorry for the slow response, was away last week. On x86 host, I built a compiler

configure --enable-languages=c,c++,lto --target=s390-none-linux-gnu
make all-gcc
make check-gcc RUNTESTFLAGS=tree-ssa.exp=ssa-dom-cse-2.c

and that shows the tests passing. gcc -v shows little further:
Reading specs from ./gcc/specs
COLLECT_GCC=./gcc/xgcc
COLLECT_LTO_WRAPPER=./gcc/lto-wrapper
Target: s390-none-linux-gnu
Configured with: /work/alalaw01/src2/gcc/configure --enable-languages=c,c++,lto 
--target=s390-none-linux-gnu
Thread model: posix
gcc version 6.0.0 20151206 (experimental) (GCC)

I speculate that perhaps the -march=z13 is default for s390 *linux* ???

If you can send me alternative configury, or dumps from your failing compiler 
(dom{2,3}-details, optimized), I can take a look.

Thanks, Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2016-02-03 11:41     ` Alan Lawrence
@ 2016-02-04  9:53       ` Dominik Vogt
  2016-02-04 11:53         ` Alan Lawrence
  0 siblings, 1 reply; 48+ messages in thread
From: Dominik Vogt @ 2016-02-04  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Alan Lawrence, David Edelsohn, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

On Wed, Feb 03, 2016 at 11:41:02AM +0000, Alan Lawrence wrote:
> On 26/01/16 12:23, Dominik Vogt wrote:
> >On Mon, Dec 21, 2015 at 01:13:28PM +0000, Alan Lawrence wrote:
> >>...the test passes with --param sra-max-scalarization-size-Ospeed.
> >>
> >>Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc, s390.
> >
> >How did you test this on s390?  For me, the test still fails
> >unless I add -march=z13 (s390x).
> 
> Sorry for the slow response, was away last week. On x86 host, I built a compiler
> 
> configure --enable-languages=c,c++,lto --target=s390-none-linux-gnu
                                                  ^^^^
Looks like the test fails only on s390x (64-Bit ) without
-march=z13 but works on s390 (31-Bit).

> I speculate that perhaps the -march=z13 is default for s390 *linux* ???

No.

> If you can send me alternative configury,

Just replacing the --target option with

  --target=s390x-none-linux-gnu

should do this.  (You have to remove the "dg-additional-options"
that adds -march=z13 from the testcase in the upstream code.)

> or dumps from your failing
> compiler (dom{2,3}-details, optimized), I can take a look.

Thanks; the dumps are attached to this message.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[-- Attachment #2: ssa-dom-cse-2.c.110t.dom2 --]
[-- Type: text/plain, Size: 6543 bytes --]


;; Function foo (foo, funcdef_no=0, decl_uid=1972, cgraph_uid=0, symbol_order=0)

Created preheader block for loop 1
;; 2 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 5 6 4
;;
;; Loop 1
;;  header 5, latch 6
;;  depth 1, outer 0
;;  nodes: 5 6
;; 2 succs { 3 4 }
;; 3 succs { 5 }
;; 5 succs { 6 4 }
;; 6 succs { 5 }
;; 4 succs { 1 }


Optimizing block #0



Optimizing block #2

Optimizing statement MEM[(int[8] *)&a] = 0;
LKUP STMT MEM[(int[8] *)&a] = 0 with .MEM_3(D)
LKUP STMT 0 = MEM[(int[8] *)&a] with .MEM_3(D)
LKUP STMT 0 = MEM[(int[8] *)&a] with .MEM_17
2>>> STMT 0 = MEM[(int[8] *)&a] with .MEM_17
Optimizing statement MEM[(int[8] *)&a + 4B] = 1;
LKUP STMT MEM[(int[8] *)&a + 4B] = 1 with .MEM_17
LKUP STMT 1 = MEM[(int[8] *)&a + 4B] with .MEM_17
LKUP STMT 1 = MEM[(int[8] *)&a + 4B] with .MEM_18
2>>> STMT 1 = MEM[(int[8] *)&a + 4B] with .MEM_18
Optimizing statement MEM[(int[8] *)&a + 8B] = 2;
LKUP STMT MEM[(int[8] *)&a + 8B] = 2 with .MEM_18
LKUP STMT 2 = MEM[(int[8] *)&a + 8B] with .MEM_18
LKUP STMT 2 = MEM[(int[8] *)&a + 8B] with .MEM_19
2>>> STMT 2 = MEM[(int[8] *)&a + 8B] with .MEM_19
Optimizing statement MEM[(int[8] *)&a + 12B] = 3;
LKUP STMT MEM[(int[8] *)&a + 12B] = 3 with .MEM_19
LKUP STMT 3 = MEM[(int[8] *)&a + 12B] with .MEM_19
LKUP STMT 3 = MEM[(int[8] *)&a + 12B] with .MEM_20
2>>> STMT 3 = MEM[(int[8] *)&a + 12B] with .MEM_20
Optimizing statement MEM[(int[8] *)&a + 16B] = 4;
LKUP STMT MEM[(int[8] *)&a + 16B] = 4 with .MEM_20
LKUP STMT 4 = MEM[(int[8] *)&a + 16B] with .MEM_20
LKUP STMT 4 = MEM[(int[8] *)&a + 16B] with .MEM_21
2>>> STMT 4 = MEM[(int[8] *)&a + 16B] with .MEM_21
Optimizing statement MEM[(int[8] *)&a + 20B] = 5;
LKUP STMT MEM[(int[8] *)&a + 20B] = 5 with .MEM_21
LKUP STMT 5 = MEM[(int[8] *)&a + 20B] with .MEM_21
LKUP STMT 5 = MEM[(int[8] *)&a + 20B] with .MEM_22
2>>> STMT 5 = MEM[(int[8] *)&a + 20B] with .MEM_22
Optimizing statement MEM[(int[8] *)&a + 24B] = 6;
LKUP STMT MEM[(int[8] *)&a + 24B] = 6 with .MEM_22
LKUP STMT 6 = MEM[(int[8] *)&a + 24B] with .MEM_22
LKUP STMT 6 = MEM[(int[8] *)&a + 24B] with .MEM_23
2>>> STMT 6 = MEM[(int[8] *)&a + 24B] with .MEM_23
Optimizing statement MEM[(int[8] *)&a + 28B] = 7;
LKUP STMT MEM[(int[8] *)&a + 28B] = 7 with .MEM_23
LKUP STMT 7 = MEM[(int[8] *)&a + 28B] with .MEM_23
LKUP STMT 7 = MEM[(int[8] *)&a + 28B] with .MEM_24
2>>> STMT 7 = MEM[(int[8] *)&a + 28B] with .MEM_24
Optimizing statement i.0_10 = 0;
LKUP STMT i.0_10 = 0
==== ASGN i.0_10 = 0
Optimizing statement if (i.0_10 != 8)
  Replaced 'i.0_10' with constant '0'
gimple_simplified to if (1 != 0)
  Folded to: if (1 != 0)
LKUP STMT 1 ne_expr 0


Optimizing block #3

1>>> STMT 1 = 1 ne_expr 0
1>>> STMT 0 = 1 eq_expr 0
LKUP STMT i_2 = PHI <0>
2>>> STMT i_2 = PHI <0>
LKUP STMT sum_1 = PHI <0>
FIND: 0
0>>> COPY sum_1 = 0
<<<< STMT i_2 = PHI <0>


Optimizing block #5

LKUP STMT i_12 = PHI <0, i_9>
2>>> STMT i_12 = PHI <0, i_9>
LKUP STMT sum_13 = PHI <0, sum_8>
2>>> STMT sum_13 = PHI <0, sum_8>
<<<< STMT sum_13 = PHI <0, sum_8>
<<<< STMT i_12 = PHI <0, i_9>
Optimizing statement _7 = a[i_12];
LKUP STMT _7 = a[i_12] with .MEM_24
2>>> STMT _7 = a[i_12] with .MEM_24
Optimizing statement sum_8 = sum_13 + _7;
LKUP STMT sum_8 = sum_13 plus_expr _7
Optimizing statement i_9 = i_12 + 1;
LKUP STMT i_9 = i_12 plus_expr 1
Optimizing statement i.0_6 = (unsigned int) i_9;
LKUP STMT i.0_6 = nop_expr i_9
2>>> STMT i.0_6 = nop_expr i_9
Optimizing statement if (i.0_6 != 8)
LKUP STMT i.0_6 ne_expr 8
0>>> COPY i.0_6 = 8
<<<< COPY i.0_6 = 8


Optimizing block #6

1>>> STMT 1 = i.0_6 ne_expr 8
1>>> STMT 0 = i.0_6 eq_expr 8
<<<< STMT 0 = i.0_6 eq_expr 8
<<<< STMT 1 = i.0_6 ne_expr 8
<<<< STMT i.0_6 = nop_expr i_9
<<<< STMT _7 = a[i_12] with .MEM_24
0>>> COPY i_12 = 0
0>>> COPY sum_13 = 0
LKUP STMT _7 = a[0] with .MEM_24
FIND: 0
0>>> COPY _7 = 0
Match-and-simplified sum_13 + _7 to 0
0>>> COPY sum_8 = 0
Match-and-simplified i_12 + 1 to 1
0>>> COPY i_9 = 1
Match-and-simplified (unsigned int) i_9 to 1
0>>> COPY i.0_6 = 1
<<<< COPY i.0_6 = 1
<<<< COPY i_9 = 1
<<<< COPY sum_8 = 0
<<<< COPY _7 = 0
<<<< COPY sum_13 = 0
<<<< COPY i_12 = 0
  Registering jump thread: (3, 5) incoming edge;  (5, 6) normal;
<<<< STMT 0 = 1 eq_expr 0
<<<< STMT 1 = 1 ne_expr 0
<<<< COPY sum_1 = 0


Optimizing block #4

LKUP STMT sum_14 = PHI <sum_8, 0>
2>>> STMT sum_14 = PHI <sum_8, 0>
<<<< STMT sum_14 = PHI <sum_8, 0>
Optimizing statement a ={v} {CLOBBER};
Optimizing statement return sum_14;
  Replaced 'sum_14' with variable 'sum_8'
1>>> STMT 1 = 1 ne_expr 0
1>>> STMT 0 = 1 eq_expr 0
0>>> COPY i_2 = 0
0>>> COPY sum_1 = 0
0>>> COPY i_12 = 0
0>>> COPY sum_13 = 0
LKUP STMT _7 = a[0] with .MEM_24
FIND: 0
0>>> COPY _7 = 0
Match-and-simplified sum_13 + _7 to 0
0>>> COPY sum_8 = 0
Match-and-simplified i_12 + 1 to 1
0>>> COPY i_9 = 1
Match-and-simplified (unsigned int) i_9 to 1
0>>> COPY i.0_6 = 1
  Registering jump thread: (2, 3) incoming edge;  (3, 5) joiner;  (5, 6) normal;
<<<< COPY i.0_6 = 1
<<<< COPY i_9 = 1
<<<< COPY sum_8 = 0
<<<< COPY _7 = 0
<<<< COPY sum_13 = 0
<<<< COPY i_12 = 0
<<<< COPY sum_1 = 0
<<<< COPY i_2 = 0
<<<< STMT 0 = 1 eq_expr 0
<<<< STMT 1 = 1 ne_expr 0
<<<< STMT 7 = MEM[(int[8] *)&a + 28B] with .MEM_24
<<<< STMT 6 = MEM[(int[8] *)&a + 24B] with .MEM_23
<<<< STMT 5 = MEM[(int[8] *)&a + 20B] with .MEM_22
<<<< STMT 4 = MEM[(int[8] *)&a + 16B] with .MEM_21
<<<< STMT 3 = MEM[(int[8] *)&a + 12B] with .MEM_20
<<<< STMT 2 = MEM[(int[8] *)&a + 8B] with .MEM_19
<<<< STMT 1 = MEM[(int[8] *)&a + 4B] with .MEM_18
<<<< STMT 0 = MEM[(int[8] *)&a] with .MEM_17
Merging blocks 2 and 3
Removing basic block 6
basic block 6, loop depth 1
 pred:      
goto <bb 5>;
 succ:       5


fix_loop_structure: fixing up loops for function
foo ()
{
  const int SR.8;
  const int SR.7;
  const int SR.6;
  const int SR.5;
  const int SR.4;
  const int SR.3;
  const int SR.2;
  const int SR.1;
  int sum;
  int i;
  const int a[8];
  unsigned int i.0_6;
  int _7;
  unsigned int i.0_10;

  <bb 2>:
  MEM[(int[8] *)&a] = 0;
  MEM[(int[8] *)&a + 4B] = 1;
  MEM[(int[8] *)&a + 8B] = 2;
  MEM[(int[8] *)&a + 12B] = 3;
  MEM[(int[8] *)&a + 16B] = 4;
  MEM[(int[8] *)&a + 20B] = 5;
  MEM[(int[8] *)&a + 24B] = 6;
  MEM[(int[8] *)&a + 28B] = 7;
  i.0_10 = 0;

  <bb 3>:
  # i_12 = PHI <0(2), i_9(3)>
  # sum_13 = PHI <0(2), sum_8(3)>
  _7 = a[i_12];
  sum_8 = sum_13 + _7;
  i_9 = i_12 + 1;
  i.0_6 = (unsigned int) i_9;
  if (i.0_6 != 8)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 4>:
  # sum_14 = PHI <sum_8(3)>
  a ={v} {CLOBBER};
  return sum_8;

}



[-- Attachment #3: ssa-dom-cse-2.c.166t.dom3 --]
[-- Type: text/plain, Size: 9241 bytes --]


;; Function foo (foo, funcdef_no=0, decl_uid=1972, cgraph_uid=0, symbol_order=0)

;; 2 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }


Optimizing block #0



Optimizing block #2

Optimizing statement vect_cst__12 = { 0, 1 };
LKUP STMT vect_cst__12 = { 0, 1 }
==== ASGN vect_cst__12 = { 0, 1 }
Optimizing statement vect_cst__13 = { 2, 3 };
LKUP STMT vect_cst__13 = { 2, 3 }
==== ASGN vect_cst__13 = { 2, 3 }
Optimizing statement vect_cst__6 = { 4, 5 };
LKUP STMT vect_cst__6 = { 4, 5 }
==== ASGN vect_cst__6 = { 4, 5 }
Optimizing statement vect_cst__10 = { 6, 7 };
LKUP STMT vect_cst__10 = { 6, 7 }
==== ASGN vect_cst__10 = { 6, 7 }
Optimizing statement MEM[(int[8] *)&a] = vect_cst__12;
  Replaced 'vect_cst__12' with constant '{ 0, 1 }'
LKUP STMT MEM[(int[8] *)&a] = { 0, 1 } with .MEM_3(D)
LKUP STMT { 0, 1 } = MEM[(int[8] *)&a] with .MEM_3(D)
LKUP STMT { 0, 1 } = MEM[(int[8] *)&a] with .MEM_63
2>>> STMT { 0, 1 } = MEM[(int[8] *)&a] with .MEM_63
Optimizing statement _62 = &MEM[(int[8] *)&a] + 8;
LKUP STMT _62 = &MEM[(int[8] *)&a] pointer_plus_expr 8
2>>> STMT _62 = &MEM[(int[8] *)&a] pointer_plus_expr 8
==== ASGN _62 = &MEM[(void *)&a + 8B]
Optimizing statement MEM[(int[8] *)_62] = vect_cst__13;
  Replaced '_62' with constant '&MEM[(void *)&a + 8B]'
  Replaced 'vect_cst__13' with constant '{ 2, 3 }'
  Folded to: MEM[(int[8] *)&a + 8B] = { 2, 3 };
LKUP STMT MEM[(int[8] *)&a + 8B] = { 2, 3 } with .MEM_63
LKUP STMT { 2, 3 } = MEM[(int[8] *)&a + 8B] with .MEM_63
LKUP STMT { 2, 3 } = MEM[(int[8] *)&a + 8B] with .MEM_61
2>>> STMT { 2, 3 } = MEM[(int[8] *)&a + 8B] with .MEM_61
Optimizing statement _56 = _62 + 8;
  Replaced '_62' with constant '&MEM[(void *)&a + 8B]'
LKUP STMT _56 = &MEM[(void *)&a + 8B] pointer_plus_expr 8
2>>> STMT _56 = &MEM[(void *)&a + 8B] pointer_plus_expr 8
==== ASGN _56 = &MEM[(void *)&a + 16B]
Optimizing statement MEM[(int[8] *)_56] = vect_cst__6;
  Replaced '_56' with constant '&MEM[(void *)&a + 16B]'
  Replaced 'vect_cst__6' with constant '{ 4, 5 }'
  Folded to: MEM[(int[8] *)&a + 16B] = { 4, 5 };
LKUP STMT MEM[(int[8] *)&a + 16B] = { 4, 5 } with .MEM_61
LKUP STMT { 4, 5 } = MEM[(int[8] *)&a + 16B] with .MEM_61
LKUP STMT { 4, 5 } = MEM[(int[8] *)&a + 16B] with .MEM_55
2>>> STMT { 4, 5 } = MEM[(int[8] *)&a + 16B] with .MEM_55
Optimizing statement _54 = _56 + 8;
  Replaced '_56' with constant '&MEM[(void *)&a + 16B]'
LKUP STMT _54 = &MEM[(void *)&a + 16B] pointer_plus_expr 8
2>>> STMT _54 = &MEM[(void *)&a + 16B] pointer_plus_expr 8
==== ASGN _54 = &MEM[(void *)&a + 24B]
Optimizing statement MEM[(int[8] *)_54] = vect_cst__10;
  Replaced '_54' with constant '&MEM[(void *)&a + 24B]'
  Replaced 'vect_cst__10' with constant '{ 6, 7 }'
  Folded to: MEM[(int[8] *)&a + 24B] = { 6, 7 };
LKUP STMT MEM[(int[8] *)&a + 24B] = { 6, 7 } with .MEM_55
LKUP STMT { 6, 7 } = MEM[(int[8] *)&a + 24B] with .MEM_55
LKUP STMT { 6, 7 } = MEM[(int[8] *)&a + 24B] with .MEM_49
2>>> STMT { 6, 7 } = MEM[(int[8] *)&a + 24B] with .MEM_49
Optimizing statement _4 = a[0];
LKUP STMT _4 = a[0] with .MEM_49
2>>> STMT _4 = a[0] with .MEM_49
Optimizing statement sum_15 = _4;
LKUP STMT sum_15 = _4
==== ASGN sum_15 = _4
Optimizing statement i_16 = 1;
LKUP STMT i_16 = 1
==== ASGN i_16 = 1
Optimizing statement ivtmp_25 = 7;
LKUP STMT ivtmp_25 = 7
==== ASGN ivtmp_25 = 7
Optimizing statement _29 = a[i_16];
  Replaced 'i_16' with constant '1'
LKUP STMT _29 = a[1] with .MEM_49
2>>> STMT _29 = a[1] with .MEM_49
Optimizing statement sum_30 = sum_15 + _29;
  Replaced 'sum_15' with variable '_4'
LKUP STMT sum_30 = _4 plus_expr _29
2>>> STMT sum_30 = _4 plus_expr _29
Optimizing statement i_31 = i_16 + 1;
  Replaced 'i_16' with constant '1'
gimple_simplified to i_31 = 2;
  Folded to: i_31 = 2;
LKUP STMT i_31 = 2
==== ASGN i_31 = 2
Optimizing statement _36 = a[i_31];
  Replaced 'i_31' with constant '2'
LKUP STMT _36 = a[2] with .MEM_49
2>>> STMT _36 = a[2] with .MEM_49
Optimizing statement sum_37 = sum_30 + _36;
LKUP STMT sum_37 = sum_30 plus_expr _36
2>>> STMT sum_37 = sum_30 plus_expr _36
Optimizing statement i_38 = i_31 + 1;
  Replaced 'i_31' with constant '2'
gimple_simplified to i_38 = 3;
  Folded to: i_38 = 3;
LKUP STMT i_38 = 3
==== ASGN i_38 = 3
Optimizing statement _43 = a[i_38];
  Replaced 'i_38' with constant '3'
LKUP STMT _43 = a[3] with .MEM_49
2>>> STMT _43 = a[3] with .MEM_49
Optimizing statement sum_44 = sum_37 + _43;
LKUP STMT sum_44 = sum_37 plus_expr _43
2>>> STMT sum_44 = sum_37 plus_expr _43
Optimizing statement i_45 = i_38 + 1;
  Replaced 'i_38' with constant '3'
gimple_simplified to i_45 = 4;
  Folded to: i_45 = 4;
LKUP STMT i_45 = 4
==== ASGN i_45 = 4
Optimizing statement _50 = a[i_45];
  Replaced 'i_45' with constant '4'
LKUP STMT _50 = a[4] with .MEM_49
2>>> STMT _50 = a[4] with .MEM_49
Optimizing statement sum_51 = sum_44 + _50;
LKUP STMT sum_51 = sum_44 plus_expr _50
2>>> STMT sum_51 = sum_44 plus_expr _50
Optimizing statement i_52 = i_45 + 1;
  Replaced 'i_45' with constant '4'
gimple_simplified to i_52 = 5;
  Folded to: i_52 = 5;
LKUP STMT i_52 = 5
==== ASGN i_52 = 5
Optimizing statement _57 = a[i_52];
  Replaced 'i_52' with constant '5'
LKUP STMT _57 = a[5] with .MEM_49
2>>> STMT _57 = a[5] with .MEM_49
Optimizing statement sum_58 = sum_51 + _57;
LKUP STMT sum_58 = sum_51 plus_expr _57
2>>> STMT sum_58 = sum_51 plus_expr _57
Optimizing statement i_59 = i_52 + 1;
  Replaced 'i_52' with constant '5'
gimple_simplified to i_59 = 6;
  Folded to: i_59 = 6;
LKUP STMT i_59 = 6
==== ASGN i_59 = 6
Optimizing statement _64 = a[i_59];
  Replaced 'i_59' with constant '6'
LKUP STMT _64 = a[6] with .MEM_49
2>>> STMT _64 = a[6] with .MEM_49
Optimizing statement sum_65 = sum_58 + _64;
LKUP STMT sum_65 = sum_58 plus_expr _64
2>>> STMT sum_65 = sum_58 plus_expr _64
Optimizing statement i_66 = i_59 + 1;
  Replaced 'i_59' with constant '6'
gimple_simplified to i_66 = 7;
  Folded to: i_66 = 7;
LKUP STMT i_66 = 7
==== ASGN i_66 = 7
Optimizing statement ivtmp_67 = ivtmp_25 + 4294967290;
  Replaced 'ivtmp_25' with constant '7'
gimple_simplified to ivtmp_67 = 1;
  Folded to: ivtmp_67 = 1;
LKUP STMT ivtmp_67 = 1
==== ASGN ivtmp_67 = 1
Optimizing statement _7 = a[i_66];
  Replaced 'i_66' with constant '7'
LKUP STMT _7 = a[7] with .MEM_49
2>>> STMT _7 = a[7] with .MEM_49
Optimizing statement sum_8 = _7 + sum_65;
LKUP STMT sum_8 = _7 plus_expr sum_65
2>>> STMT sum_8 = _7 plus_expr sum_65
Optimizing statement i_9 = i_66 + 1;
  Replaced 'i_66' with constant '7'
gimple_simplified to i_9 = 8;
  Folded to: i_9 = 8;
LKUP STMT i_9 = 8
==== ASGN i_9 = 8
Optimizing statement ivtmp_14 = ivtmp_67 + 4294967295;
  Replaced 'ivtmp_67' with constant '1'
gimple_simplified to ivtmp_14 = 0;
  Folded to: ivtmp_14 = 0;
LKUP STMT ivtmp_14 = 0
==== ASGN ivtmp_14 = 0
Optimizing statement a ={v} {CLOBBER};
Optimizing statement return sum_8;
<<<< STMT sum_8 = _7 plus_expr sum_65
<<<< STMT _7 = a[7] with .MEM_49
<<<< STMT sum_65 = sum_58 plus_expr _64
<<<< STMT _64 = a[6] with .MEM_49
<<<< STMT sum_58 = sum_51 plus_expr _57
<<<< STMT _57 = a[5] with .MEM_49
<<<< STMT sum_51 = sum_44 plus_expr _50
<<<< STMT _50 = a[4] with .MEM_49
<<<< STMT sum_44 = sum_37 plus_expr _43
<<<< STMT _43 = a[3] with .MEM_49
<<<< STMT sum_37 = sum_30 plus_expr _36
<<<< STMT _36 = a[2] with .MEM_49
<<<< STMT sum_30 = _4 plus_expr _29
<<<< STMT _29 = a[1] with .MEM_49
<<<< STMT _4 = a[0] with .MEM_49
<<<< STMT { 6, 7 } = MEM[(int[8] *)&a + 24B] with .MEM_49
<<<< STMT _54 = &MEM[(void *)&a + 16B] pointer_plus_expr 8
<<<< STMT { 4, 5 } = MEM[(int[8] *)&a + 16B] with .MEM_55
<<<< STMT _56 = &MEM[(void *)&a + 8B] pointer_plus_expr 8
<<<< STMT { 2, 3 } = MEM[(int[8] *)&a + 8B] with .MEM_61
<<<< STMT _62 = &MEM[(int[8] *)&a] pointer_plus_expr 8
<<<< STMT { 0, 1 } = MEM[(int[8] *)&a] with .MEM_63
foo ()
{
  const vector(2) int * vectp.10;
  const vector(2) int * vectp_a.9;
  const int SR.8;
  const int SR.7;
  const int SR.6;
  const int SR.5;
  const int SR.4;
  const int SR.3;
  const int SR.2;
  const int SR.1;
  int sum;
  int i;
  const int a[8];
  int _4;
  vector(2) int vect_cst__6;
  int _7;
  vector(2) int vect_cst__10;
  vector(2) int vect_cst__12;
  vector(2) int vect_cst__13;
  unsigned int ivtmp_14;
  unsigned int ivtmp_25;
  int _29;
  int _36;
  int _43;
  int _50;
  const int * _54;
  const int * _56;
  int _57;
  const int * _62;
  int _64;
  unsigned int ivtmp_67;

  <bb 2>:
  vect_cst__12 = { 0, 1 };
  vect_cst__13 = { 2, 3 };
  vect_cst__6 = { 4, 5 };
  vect_cst__10 = { 6, 7 };
  MEM[(int[8] *)&a] = { 0, 1 };
  _62 = &MEM[(int[8] *)&a] + 8;
  MEM[(int[8] *)&a + 8B] = { 2, 3 };
  _56 = &MEM[(void *)&a + 8B] + 8;
  MEM[(int[8] *)&a + 16B] = { 4, 5 };
  _54 = &MEM[(void *)&a + 16B] + 8;
  MEM[(int[8] *)&a + 24B] = { 6, 7 };
  _4 = a[0];
  sum_15 = _4;
  i_16 = 1;
  ivtmp_25 = 7;
  _29 = a[1];
  sum_30 = _4 + _29;
  i_31 = 2;
  _36 = a[2];
  sum_37 = sum_30 + _36;
  i_38 = 3;
  _43 = a[3];
  sum_44 = sum_37 + _43;
  i_45 = 4;
  _50 = a[4];
  sum_51 = sum_44 + _50;
  i_52 = 5;
  _57 = a[5];
  sum_58 = sum_51 + _57;
  i_59 = 6;
  _64 = a[6];
  sum_65 = sum_58 + _64;
  i_66 = 7;
  ivtmp_67 = 1;
  _7 = a[7];
  sum_8 = _7 + sum_65;
  i_9 = 8;
  ivtmp_14 = 0;
  a ={v} {CLOBBER};
  return sum_8;

}



[-- Attachment #4: ssa-dom-cse-2.c.210t.optimized --]
[-- Type: text/plain, Size: 763 bytes --]


;; Function foo (foo, funcdef_no=0, decl_uid=1972, cgraph_uid=0, symbol_order=0)

Scope blocks after cleanups:

{ Scope block #0 
  const int a[8];
  int sum;

}
foo ()
{
  int sum;
  const int a[8];
  int _4;
  int _7;
  int _29;
  int _36;
  int _43;
  int _50;
  int _57;
  int _64;

  <bb 2>:
  MEM[(int[8] *)&a] = { 0, 1 };
  MEM[(int[8] *)&a + 8B] = { 2, 3 };
  MEM[(int[8] *)&a + 16B] = { 4, 5 };
  MEM[(int[8] *)&a + 24B] = { 6, 7 };
  _4 = a[0];
  _29 = a[1];
  sum_30 = _4 + _29;
  _36 = a[2];
  sum_37 = sum_30 + _36;
  _43 = a[3];
  sum_44 = sum_37 + _43;
  _50 = a[4];
  sum_51 = sum_44 + _50;
  _57 = a[5];
  sum_58 = sum_51 + _57;
  _64 = a[6];
  sum_65 = sum_58 + _64;
  _7 = a[7];
  sum_8 = _7 + sum_65;
  a ={v} {CLOBBER};
  return sum_8;

}



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2016-02-04  9:53       ` Dominik Vogt
@ 2016-02-04 11:53         ` Alan Lawrence
  2016-02-04 12:52           ` Richard Biener
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Lawrence @ 2016-02-04 11:53 UTC (permalink / raw)
  To: vogt, gcc-patches, David Edelsohn, Andreas Krebbel

On 04/02/16 09:53, Dominik Vogt wrote:
> On Wed, Feb 03, 2016 at 11:41:02AM +0000, Alan Lawrence wrote:
>> On 26/01/16 12:23, Dominik Vogt wrote:
>>> On Mon, Dec 21, 2015 at 01:13:28PM +0000, Alan Lawrence wrote:
>>>> ...the test passes with --param sra-max-scalarization-size-Ospeed.
>>>>
>>>> Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc, s390.
>>>
>>> How did you test this on s390?  For me, the test still fails
>>> unless I add -march=z13 (s390x).
>>
>> Sorry for the slow response, was away last week. On x86 host, I built a compiler
>>
>> configure --enable-languages=c,c++,lto --target=s390-none-linux-gnu
>                                                    ^^^^
> Looks like the test fails only on s390x (64-Bit ) without
> -march=z13 but works on s390 (31-Bit).

Ah, yes, I see. Loop is not unrolled for dom2, then vectorizer kicks in but 
vectorizes only the initialization/loading, and dom can't see the redundancy in

MEM[(int[8] *)&a] = { 0, 1 };
...
_23 = a[0];

as they aren't reading equivalent chunks of memory.

Same problem as on Alpha, and powerpc64 (without -mcpu=power7/power8).

Powerpc chose to XFAIL rather than add -mcpu=power7, but I'm OK with any 
testsuite workaround here, such as yours.

Cheers, Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms
  2016-02-04 11:53         ` Alan Lawrence
@ 2016-02-04 12:52           ` Richard Biener
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Biener @ 2016-02-04 12:52 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: vogt, GCC Patches, David Edelsohn, Andreas Krebbel

On Thu, Feb 4, 2016 at 12:53 PM, Alan Lawrence
<alan.lawrence@foss.arm.com> wrote:
> On 04/02/16 09:53, Dominik Vogt wrote:
>>
>> On Wed, Feb 03, 2016 at 11:41:02AM +0000, Alan Lawrence wrote:
>>>
>>> On 26/01/16 12:23, Dominik Vogt wrote:
>>>>
>>>> On Mon, Dec 21, 2015 at 01:13:28PM +0000, Alan Lawrence wrote:
>>>>>
>>>>> ...the test passes with --param sra-max-scalarization-size-Ospeed.
>>>>>
>>>>> Verified on aarch64 and with stage1 compiler for hppa, powerpc, sparc,
>>>>> s390.
>>>>
>>>>
>>>> How did you test this on s390?  For me, the test still fails
>>>> unless I add -march=z13 (s390x).
>>>
>>>
>>> Sorry for the slow response, was away last week. On x86 host, I built a
>>> compiler
>>>
>>> configure --enable-languages=c,c++,lto --target=s390-none-linux-gnu
>>
>>                                                    ^^^^
>> Looks like the test fails only on s390x (64-Bit ) without
>> -march=z13 but works on s390 (31-Bit).
>
>
> Ah, yes, I see. Loop is not unrolled for dom2, then vectorizer kicks in but
> vectorizes only the initialization/loading, and dom can't see the redundancy
> in
>
> MEM[(int[8] *)&a] = { 0, 1 };
> ...
> _23 = a[0];
>
> as they aren't reading equivalent chunks of memory.

Yep, only FRE/PRE know enough tricks to CSE this.

Richard.

> Same problem as on Alpha, and powerpc64 (without -mcpu=power7/power8).
>
> Powerpc chose to XFAIL rather than add -mcpu=power7, but I'm OK with any
> testsuite workaround here, such as yours.
>
> Cheers, Alan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* hf not implying hf (was: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c)
  2016-01-19 13:22         ` Alan Lawrence
  2016-01-19 13:33           ` Christophe Lyon
@ 2016-03-01  4:15           ` Hans-Peter Nilsson
  1 sibling, 0 replies; 48+ messages in thread
From: Hans-Peter Nilsson @ 2016-03-01  4:15 UTC (permalink / raw)
  To: alan.lawrence; +Cc: christophe.lyon, gcc-patches

> From: Alan Lawrence <alan.lawrence@foss.arm.com>
> Date: Tue, 19 Jan 2016 13:22:13 +0000

(Regarding some incidentally failing tests)

> Hmm, I still see these passing, both natively on arm-none-linux-gnueabihf and 
> with a cross-build. hf implies --with-float=hard, right?

(Since you mention it...)

Oddly, it doesn't; you have to pass --with-float=hard too for
--target=arm-none-linux-gnueabihf to actually default to "hf".

IMHO this should obviously change but maybe it's too late?

And, supposedly all packagers know this wart by now and new ones
re-discover it soon enough...

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2016-03-01  4:15 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-21 13:14 [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
2015-12-21 13:14 ` [PATCH 1/4] Make SRA scalarize constant-pool loads Alan Lawrence
2015-12-24 11:53   ` Alan Lawrence
2016-01-04 12:13     ` Alan Lawrence
2016-01-15 10:27     ` Alan Lawrence
2016-01-18 11:04       ` Richard Biener
2015-12-21 13:14 ` [PATCH 3/4] Add a pass_copy_prop following pass_ch_vect Alan Lawrence
2015-12-26 17:58   ` Richard Biener
2016-01-14 10:29     ` [PATCH 3/4 v2] Enhance SCEV to follow copies of SSA_NAMEs Alan Lawrence
2016-01-14 12:30       ` Richard Biener
2016-01-15  9:48         ` Alan Lawrence
2016-01-15 10:07           ` Richard Biener
2016-01-15 10:38             ` Alan Lawrence
2016-01-15 10:41               ` Richard Biener
2015-12-21 13:14 ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Alan Lawrence
2015-12-24 11:55   ` Alan Lawrence
2016-01-04 19:05     ` Jeff Law
2016-01-19  3:05     ` H.J. Lu
2016-01-19  9:46       ` Christophe Lyon
2016-01-19 13:22         ` Alan Lawrence
2016-01-19 13:33           ` Christophe Lyon
2016-03-01  4:15           ` hf not implying hf (was: [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c) Hans-Peter Nilsson
2016-01-04 19:08   ` [PATCH 2/4] Equate MEM_REFs and ARRAY_REFs in tree-ssa-scopedtables.c Jeff Law
2016-01-05  7:29     ` Richard Biener
2016-01-05 16:29       ` Alan Lawrence
2016-01-05 16:36         ` Jeff Law
2016-01-08 10:45           ` Richard Biener
2015-12-21 13:22 ` [PATCH 4/4] Un-XFAIL ssa-dom-cse-2.c for most platforms Alan Lawrence
2015-12-21 14:10   ` David Edelsohn
2015-12-21 14:59     ` Bill Schmidt
2015-12-21 15:22       ` Alan Lawrence
2015-12-21 15:33         ` Bill Schmidt
2015-12-22 16:00           ` Alan Lawrence
2015-12-22 17:02             ` Bill Schmidt
2015-12-24 20:01             ` Mike Stump
2016-01-04 12:06               ` Alan Lawrence
2016-01-15 10:46           ` Alan Lawrence
2016-01-15 10:50             ` Richard Biener
2015-12-21 17:34       ` Alan Lawrence
2016-01-26 12:23   ` Dominik Vogt
2016-02-03 11:41     ` Alan Lawrence
2016-02-04  9:53       ` Dominik Vogt
2016-02-04 11:53         ` Alan Lawrence
2016-02-04 12:52           ` Richard Biener
2015-12-22 15:54 ` [PATCH 0/4 v3] Fix PR/63679 when --param sra-max-scalarization-size specified Alan Lawrence
2015-12-22 16:58   ` Bill Schmidt
2015-12-22 17:36     ` Alan Lawrence
2015-12-22 23:02       ` Bill Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).