public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses.
@ 2010-12-10  0:38 Sebastian Pop
  2010-12-10 13:13 ` Richard Guenther
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Pop @ 2010-12-10  0:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

Hi,

This patch replaces the heuristic that aggregates the partitions of
the loop distribution.  The original heuristic was acting during the
construction of a partition, mixing legality with data locality
concerns.  This patch simplifies the logic by removing this heuristic,
and postponing the aggregation of partitions: with this patch we build
all the legal partitions, and then we fuse the partitions that have
some memory affinity.

Ok for trunk after regstrap on amd64-linux?

Thanks,
Sebastian

2010-12-09  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/43023
	* tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
	Removed.
	(stores_zero_from_loop): Call stmt_stores_zero.
	* tree-data-ref.h (stmt_stores_zero): New.
	* tree-loop-distribution.c (generate_memset_zero): Do not return a
	boolean.  Call gcc_assert on stride_of_unit_type_p.
	(generate_builtin): Call stmt_stores_zero.
	(rdg_flag_all_uses): Removed.
	(rdg_flag_similar_memory_accesses): Removed.
	(build_rdg_partition_for_component): Removed parameter
	other_stores.  Removed call to rdg_flag_similar_memory_accesses.
	(can_generate_builtin): New.
	(similar_memory_accesses): New.
	(fuse_partitions_with_similar_memory_accesses): New.
	(rdg_build_partitions): Call
	fuse_partitions_with_similar_memory_accesses.

	* gfortran.dg/ldist-1.f90: Adjust pattern.
	* gfortran.dg/ldist-pr43023.f90: New.
---
 gcc/ChangeLog                               |   20 +++
 gcc/testsuite/ChangeLog                     |    6 +
 gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c     |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c    |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c    |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c    |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c    |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c     |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c     |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c     |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c     |    8 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c     |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c     |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c     |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c     |    2 +-
 gcc/testsuite/gfortran.dg/ldist-1.f90       |    7 +-
 gcc/testsuite/gfortran.dg/ldist-pr43023.f90 |   31 +++++
 gcc/tree-data-ref.c                         |   31 +----
 gcc/tree-data-ref.h                         |   32 +++++
 gcc/tree-loop-distribution.c                |  191 ++++++++++++---------------
 20 files changed, 197 insertions(+), 155 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/ldist-pr43023.f90

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9b3e799..1d1d3f0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,23 @@
+2010-12-09  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/43023
+	* tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
+	Removed.
+	(stores_zero_from_loop): Call stmt_stores_zero.
+	* tree-data-ref.h (stmt_stores_zero): New.
+	* tree-loop-distribution.c (generate_memset_zero): Do not return a
+	boolean.  Call gcc_assert on stride_of_unit_type_p.
+	(generate_builtin): Call stmt_stores_zero.
+	(rdg_flag_all_uses): Removed.
+	(rdg_flag_similar_memory_accesses): Removed.
+	(build_rdg_partition_for_component): Removed parameter
+	other_stores.  Removed call to rdg_flag_similar_memory_accesses.
+	(can_generate_builtin): New.
+	(similar_memory_accesses): New.
+	(fuse_partitions_with_similar_memory_accesses): New.
+	(rdg_build_partitions): Call
+	fuse_partitions_with_similar_memory_accesses.
+
 2010-12-08  Richard Guenther  <rguenther@suse.de>
 	    Sebastian Pop  <sebastian.pop@amd.com>
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index c0af4e3..392e2cb 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2010-12-09  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/43023
+	* gfortran.dg/ldist-1.f90: Adjust pattern.
+	* gfortran.dg/ldist-pr43023.f90: New.
+
 2010-12-08  Richard Guenther  <rguenther@suse.de>
 	    Sebastian Pop  <sebastian.pop@amd.com>
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c
index 43c1046..dcff6fb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 void foo (int * __restrict__ ia,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c
index 0790c18..2e69563 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int loop1 (int k)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c
index 88651e7..4cc5edc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 void foo (int * __restrict__ ia,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c
index 1e555fe..dcfe2de 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int foo (int * __restrict__ ia,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c
index 623aacf..2eaa140 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int foo (int * __restrict__ ia,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c
index de98ccc..d8b2545 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 void foo (int * __restrict__ a,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c
index 40adfe1..f2a96d8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c
@@ -1,11 +1,11 @@
-/* { dg-do compile { target int32plus } } */ 
+/* { dg-do compile { target int32plus } } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int loop1 (int k)
 {
   unsigned int i;
   int a[10000], b[10000], c[10000], d[10000];
-	
+
   a[0] = k; a[3] = k*2;
   c[1] = k+1;
   for (i = 2; i < (10000-1); i ++)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c
index a744fea..7030515 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int loop1 (int k)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c
index 9a03dc1..7381b27 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target int32plus } } */ 
+/* { dg-do compile { target int32plus } } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int loop1 (int k)
@@ -9,8 +9,8 @@ int loop1 (int k)
 
   a[0][0] = k;
   for (i = 1; i < 100; i ++)
-    for (j = 1; j < (100-1); j++) 
-      {   
+    for (j = 1; j < (100-1); j++)
+      {
         a[i][j] = k * i; /* S1 */
         b[i][j] = a[i][j-1] + k; /* S2 */
         c[i][j] = b[i][j] + a[i][j+1]; /* S3 */
@@ -22,7 +22,7 @@ int loop1 (int k)
      S2->S3 (flow, level 0)
      S3->S4 (flow, level 0)
   */
-  
+
   return a[100-1][100-1] + b[100-1][100-1] + c[100-1][100-1] + d[100-1][100-1];
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
index 7a38c86..2b21780 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int loop1 (int k)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c
index 124fcde..39397be 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int loop1 (int k)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c
index 4a8e066..f45f70e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int loop1 (int k)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c
index ee8d023..9c328a7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */ 
+/* { dg-do compile } */
 /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
 
 int loop1 (int k)
diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90
index dd1f02a..315e698 100644
--- a/gcc/testsuite/gfortran.dg/ldist-1.f90
+++ b/gcc/testsuite/gfortran.dg/ldist-1.f90
@@ -1,4 +1,4 @@
-! { dg-do compile }     
+! { dg-do compile }
 ! { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" }
 
 Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
@@ -29,5 +29,8 @@ Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
   return
 end Subroutine PADEC
 
-! { dg-final { scan-tree-dump-times "distributed: split to 4 loops" 1 "ldist" } }
+! There are 5 legal partitions in this code.  Based on the data
+! locality heuristic, this loop should not be split.
+
+! { dg-final { scan-tree-dump-not "distributed: split to" "ldist" } }
 ! { dg-final { cleanup-tree-dump "ldist" } }
diff --git a/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
new file mode 100644
index 0000000..3e2d04c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
@@ -0,0 +1,31 @@
+! { dg-do compile }
+! { dg-options "-O2 -ftree-loop-distribution" }
+
+MODULE NFT_mod
+
+implicit none
+integer :: Nangle
+real:: Z0
+real, dimension(:,:), allocatable :: Angle
+real, dimension(:), allocatable :: exth, ezth, hxth, hyth, hyphi
+
+CONTAINS
+
+SUBROUTINE NFT_Init()
+
+real :: th, fi
+integer :: n
+
+do n = 1,Nangle
+  th = Angle(n,1)
+  fi = Angle(n,2)
+
+  exth(n) =  cos(fi)*cos(th)
+  ezth(n) = -sin(th)
+  hxth(n) = -sin(fi)
+  hyth(n) =  cos(fi)
+  hyphi(n) = -sin(fi)
+end do
+END SUBROUTINE NFT_Init
+
+END MODULE NFT_mod
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 4dfcd5c..3276079 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -4509,7 +4509,7 @@ dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
     for (e = v->succ; e; e = e->succ_next)
       fprintf (file, " %d", e->dest);
 
-  fprintf (file, ") \n");
+  fprintf (file, ")\n");
   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
   fprintf (file, ")\n");
 }
@@ -4976,27 +4976,6 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
   free (bbs);
 }
 
-/* Returns true when STMT is an assignment that contains a data
-   reference on its LHS with a stride of the same size as its unit
-   type.  */
-
-static bool
-mem_write_stride_of_same_size_as_unit_type_p (gimple stmt)
-{
-  struct data_reference *dr = XCNEW (struct data_reference);
-  tree op0 = gimple_assign_lhs (stmt);
-  bool res;
-
-  DR_STMT (dr) = stmt;
-  DR_REF (dr) = op0;
-
-  res = dr_analyze_innermost (dr)
-    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
-
-  free_data_ref (dr);
-  return res;
-}
-
 /* Initialize STMTS with all the statements of LOOP that contain a
    store to memory of the form "A[i] = 0".  */
 
@@ -5007,18 +4986,12 @@ stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
   basic_block bb;
   gimple_stmt_iterator si;
   gimple stmt;
-  tree op;
   basic_block *bbs = get_loop_body_in_dom_order (loop);
 
   for (i = 0; i < loop->num_nodes; i++)
     for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
       if ((stmt = gsi_stmt (si))
-	  && gimple_vdef (stmt)
-	  && is_gimple_assign (stmt)
-	  && gimple_assign_rhs_code (stmt) == INTEGER_CST
-	  && (op = gimple_assign_rhs1 (stmt))
-	  && (integer_zerop (op) || real_zerop (op))
-	  && mem_write_stride_of_same_size_as_unit_type_p (stmt))
+	  && stmt_stores_zero (stmt))
 	VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si));
 
   free (bbs);
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index d929f31..e1006aa 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -614,6 +614,38 @@ stride_of_unit_type_p (tree stride, tree type)
 			     TYPE_SIZE_UNIT (type));
 }
 
+/* Returns true when the statement at STMT is of the form "A[i] = 0"
+   that contains a data reference on its LHS with a stride of the same
+   size as its unit type.  */
+
+static inline bool
+stmt_stores_zero (gimple stmt)
+{
+  tree op0, op1;
+  bool res;
+  struct data_reference *dr;
+
+  if (!stmt
+      || !gimple_vdef (stmt)
+      || !is_gimple_assign (stmt)
+      || !gimple_assign_single_p (stmt)
+      || !(op1 = gimple_assign_rhs1 (stmt))
+      || !(integer_zerop (op1) || real_zerop (op1)))
+    return false;
+
+  dr = XCNEW (struct data_reference);
+  op0 = gimple_assign_lhs (stmt);
+
+  DR_STMT (dr) = stmt;
+  DR_REF (dr) = op0;
+
+  res = dr_analyze_innermost (dr)
+    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
+
+  free_data_ref (dr);
+  return res;
+}
+
 /* Determines whether RDG vertices V1 and V2 access to similar memory
    locations, in which case they have to be in the same partition.  */
 
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 357f51f..4fc246e 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -241,7 +241,7 @@ build_size_arg_loc (location_t loc, tree nb_iter, tree op,
 
 /* Generate a call to memset.  Return true when the operation succeeded.  */
 
-static bool
+static void
 generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
 		      gimple_stmt_iterator bsi)
 {
@@ -255,11 +255,8 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
 
   DR_STMT (dr) = stmt;
   DR_REF (dr) = op0;
-  if (!dr_analyze_innermost (dr))
-    goto end;
-
-  if (!stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)))
-    goto end;
+  res = dr_analyze_innermost (dr);
+  gcc_assert (res && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)));
 
   nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
   addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
@@ -286,14 +283,11 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
   fn_call = gimple_build_call (fn, 3, mem, integer_zero_node, nb_bytes);
   gimple_seq_add_stmt (&stmt_list, fn_call);
   gsi_insert_seq_after (&bsi, stmt_list, GSI_CONTINUE_LINKING);
-  res = true;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "generated memset zero\n");
 
- end:
   free_data_ref (dr);
-  return res;
 }
 
 /* Tries to generate a builtin function for the instructions of LOOP
@@ -307,7 +301,6 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
   unsigned i, x = 0;
   basic_block *bbs;
   gimple write = NULL;
-  tree op0, op1;
   gimple_stmt_iterator bsi;
   tree nb_iter = number_of_exit_cond_executions (loop);
 
@@ -343,26 +336,17 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
 	}
     }
 
-  if (!write)
-    goto end;
-
-  op0 = gimple_assign_lhs (write);
-  op1 = gimple_assign_rhs1 (write);
-
-  if (!(TREE_CODE (op0) == ARRAY_REF
-	|| TREE_CODE (op0) == MEM_REF))
+  if (!stmt_stores_zero (write))
     goto end;
 
   /* The new statements will be placed before LOOP.  */
   bsi = gsi_last_bb (loop_preheader_edge (loop)->src);
-
-  if (gimple_assign_rhs_code (write) == INTEGER_CST
-      && (integer_zerop (op1) || real_zerop (op1)))
-    res = generate_memset_zero (write, op0, nb_iter, bsi);
+  generate_memset_zero (write, gimple_assign_lhs (write), nb_iter, bsi);
+  res = true;
 
   /* If this is the last partition for which we generate code, we have
      to destroy the loop.  */
-  if (res && !copy_p)
+  if (!copy_p)
     {
       unsigned nbbs = loop->num_nodes;
       edge exit = single_exit (loop);
@@ -504,24 +488,6 @@ has_upstream_mem_writes (int u)
 static void rdg_flag_vertex_and_dependent (struct graph *, int, bitmap, bitmap,
 					   bitmap, bool *);
 
-/* Flag all the uses of U.  */
-
-static void
-rdg_flag_all_uses (struct graph *rdg, int u, bitmap partition, bitmap loops,
-		   bitmap processed, bool *part_has_writes)
-{
-  struct graph_edge *e;
-
-  for (e = rdg->vertices[u].succ; e; e = e->succ_next)
-    if (!bitmap_bit_p (processed, e->dest))
-      {
-	rdg_flag_vertex_and_dependent (rdg, e->dest, partition, loops,
-				       processed, part_has_writes);
-	rdg_flag_all_uses (rdg, e->dest, partition, loops, processed,
-			   part_has_writes);
-      }
-}
-
 /* Flag the uses of U stopping following the information from
    upstream_mem_writes.  */
 
@@ -689,68 +655,13 @@ rdg_flag_loop_exits (struct graph *rdg, bitmap loops, bitmap partition,
     }
 }
 
-/* Flag all the nodes of RDG containing memory accesses that could
-   potentially belong to arrays already accessed in the current
-   PARTITION.  */
-
-static void
-rdg_flag_similar_memory_accesses (struct graph *rdg, bitmap partition,
-				  bitmap loops, bitmap processed,
-				  VEC (int, heap) **other_stores)
-{
-  bool foo;
-  unsigned i, n;
-  int j, k, kk;
-  bitmap_iterator ii;
-  struct graph_edge *e;
-
-  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, ii)
-    if (RDG_MEM_WRITE_STMT (rdg, i)
-	|| RDG_MEM_READS_STMT (rdg, i))
-      {
-	for (j = 0; j < rdg->n_vertices; j++)
-	  if (!bitmap_bit_p (processed, j)
-	      && (RDG_MEM_WRITE_STMT (rdg, j)
-		  || RDG_MEM_READS_STMT (rdg, j))
-	      && rdg_has_similar_memory_accesses (rdg, i, j))
-	    {
-	      /* Flag first the node J itself, and all the nodes that
-		 are needed to compute J.  */
-	      rdg_flag_vertex_and_dependent (rdg, j, partition, loops,
-					     processed, &foo);
-
-	      /* When J is a read, we want to coalesce in the same
-		 PARTITION all the nodes that are using J: this is
-		 needed for better cache locality.  */
-	      rdg_flag_all_uses (rdg, j, partition, loops, processed, &foo);
-
-	      /* Remove from OTHER_STORES the vertex that we flagged.  */
-	      if (RDG_MEM_WRITE_STMT (rdg, j))
-		FOR_EACH_VEC_ELT (int, *other_stores, k, kk)
-		  if (kk == j)
-		    {
-		      VEC_unordered_remove (int, *other_stores, k);
-		      break;
-		    }
-	    }
-
-	/* If the node I has two uses, then keep these together in the
-	   same PARTITION.  */
-	for (n = 0, e = rdg->vertices[i].succ; e; e = e->succ_next, n++);
-
-	if (n > 1)
-	  rdg_flag_all_uses (rdg, i, partition, loops, processed, &foo);
-      }
-}
-
 /* Returns a bitmap in which all the statements needed for computing
    the strongly connected component C of the RDG are flagged, also
    including the loop exit conditions.  */
 
 static bitmap
 build_rdg_partition_for_component (struct graph *rdg, rdgc c,
-				   bool *part_has_writes,
-				   VEC (int, heap) **other_stores)
+				   bool *part_has_writes)
 {
   int i, v;
   bitmap partition = BITMAP_ALLOC (NULL);
@@ -762,14 +673,6 @@ build_rdg_partition_for_component (struct graph *rdg, rdgc c,
       rdg_flag_vertex_and_dependent (rdg, v, partition, loops, processed,
 				     part_has_writes);
 
-  /* Also iterate on the array of stores not in the starting vertices,
-     and determine those vertices that have some memory affinity with
-     the current nodes in the component: these are stores to the same
-     arrays, i.e. we're taking care of cache locality.  */
-  if (!flag_tree_loop_distribute_patterns)
-    rdg_flag_similar_memory_accesses (rdg, partition, loops, processed,
-				      other_stores);
-
   rdg_flag_loop_exits (rdg, loops, partition, processed, part_has_writes);
 
   BITMAP_FREE (processed);
@@ -832,6 +735,79 @@ rdg_build_components (struct graph *rdg, VEC (int, heap) *starting_vertices,
   BITMAP_FREE (saved_components);
 }
 
+/* Returns true when it is possible to generate a builtin pattern for
+   the PARTITION of RDG.  For the moment we detect only the memset
+   zero pattern.  */
+
+static bool
+can_generate_builtin (struct graph *rdg, bitmap partition)
+{
+  unsigned i;
+  bitmap_iterator bi;
+  int nb_reads = 0;
+  int nb_writes = 0;
+  int stores_zero = 0;
+
+  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, bi)
+    if (RDG_MEM_READS_STMT (rdg, i))
+      nb_reads++;
+    else if (RDG_MEM_WRITE_STMT (rdg, i))
+      {
+	nb_writes++;
+	if (stmt_stores_zero (RDG_STMT (rdg, i)))
+	  stores_zero++;
+      }
+
+  return stores_zero == 1 && nb_writes == 1 && nb_reads == 0;
+}
+
+/* Returns true when PARTITION1 and PARTITION2 have similar memory
+   accesses in RDG.  */
+
+static bool
+similar_memory_accesses (struct graph *rdg, bitmap partition1,
+			 bitmap partition2)
+{
+  unsigned i, j;
+  bitmap_iterator bi, bj;
+
+  EXECUTE_IF_SET_IN_BITMAP (partition1, 0, i, bi)
+    if (RDG_MEM_WRITE_STMT (rdg, i)
+	|| RDG_MEM_READS_STMT (rdg, i))
+      EXECUTE_IF_SET_IN_BITMAP (partition2, 0, j, bj)
+	if (RDG_MEM_WRITE_STMT (rdg, j)
+	    || RDG_MEM_READS_STMT (rdg, j))
+	  if (rdg_has_similar_memory_accesses (rdg, i, j))
+	    return true;
+
+  return false;
+}
+
+/* Fuse all the partitions from PARTITIONS that contain similar memory
+   references, i.e., we're taking care of cache locality.  This
+   function does not fuse those partitions that contain patterns that
+   can be code generated with builtins.  */
+
+static void
+fuse_partitions_with_similar_memory_accesses (struct graph *rdg,
+					      VEC (bitmap, heap) **partitions)
+{
+  int p1, p2;
+  bitmap partition1, partition2;
+
+  FOR_EACH_VEC_ELT (bitmap, *partitions, p1, partition1)
+    if (!can_generate_builtin (rdg, partition1))
+      FOR_EACH_VEC_ELT (bitmap, *partitions, p2, partition2)
+	if (p1 != p2
+	    && !can_generate_builtin (rdg, partition2)
+	    && similar_memory_accesses (rdg, partition1, partition2))
+	  {
+	    bitmap_ior_into (partition1, partition2);
+	    VEC_ordered_remove (bitmap, *partitions, p2);
+	    p2--;
+	  }
+}
+
 /* Aggregate several components into a useful partition that is
    registered in the PARTITIONS vector.  Partitions will be
    distributed in different loops.  */
@@ -854,8 +830,7 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
       if (bitmap_bit_p (processed, v))
 	continue;
 
-      np = build_rdg_partition_for_component (rdg, x, &part_has_writes,
-					      other_stores);
+      np = build_rdg_partition_for_component (rdg, x, &part_has_writes);
       bitmap_ior_into (partition, np);
       bitmap_ior_into (processed, np);
       BITMAP_FREE (np);
@@ -901,6 +876,8 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
     VEC_safe_push (bitmap, heap, *partitions, partition);
   else
     BITMAP_FREE (partition);
+
+  fuse_partitions_with_similar_memory_accesses (rdg, partitions);
 }
 
 /* Dump to FILE the PARTITIONS.  */
-- 
1.7.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses.
  2010-12-10  0:38 [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses Sebastian Pop
@ 2010-12-10 13:13 ` Richard Guenther
  2010-12-10 19:45   ` Sebastian Pop
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Guenther @ 2010-12-10 13:13 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Thu, 9 Dec 2010, Sebastian Pop wrote:

> Hi,
> 
> This patch replaces the heuristic that aggregates the partitions of
> the loop distribution.  The original heuristic was acting during the
> construction of a partition, mixing legality with data locality
> concerns.  This patch simplifies the logic by removing this heuristic,
> and postponing the aggregation of partitions: with this patch we build
> all the legal partitions, and then we fuse the partitions that have
> some memory affinity.
> 
> Ok for trunk after regstrap on amd64-linux?
> 
> Thanks,
> Sebastian
> 
> 2010-12-09  Sebastian Pop  <sebastian.pop@amd.com>
> 
> 	PR tree-optimization/43023
> 	* tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
> 	Removed.
> 	(stores_zero_from_loop): Call stmt_stores_zero.
> 	* tree-data-ref.h (stmt_stores_zero): New.
> 	* tree-loop-distribution.c (generate_memset_zero): Do not return a
> 	boolean.  Call gcc_assert on stride_of_unit_type_p.
> 	(generate_builtin): Call stmt_stores_zero.
> 	(rdg_flag_all_uses): Removed.
> 	(rdg_flag_similar_memory_accesses): Removed.
> 	(build_rdg_partition_for_component): Removed parameter
> 	other_stores.  Removed call to rdg_flag_similar_memory_accesses.
> 	(can_generate_builtin): New.
> 	(similar_memory_accesses): New.
> 	(fuse_partitions_with_similar_memory_accesses): New.
> 	(rdg_build_partitions): Call
> 	fuse_partitions_with_similar_memory_accesses.
> 
> 	* gfortran.dg/ldist-1.f90: Adjust pattern.
> 	* gfortran.dg/ldist-pr43023.f90: New.
> ---
>  gcc/ChangeLog                               |   20 +++
>  gcc/testsuite/ChangeLog                     |    6 +
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c     |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c    |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c    |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c    |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c    |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c     |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c     |    4 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c     |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c     |    8 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c     |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c     |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c     |    2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c     |    2 +-
>  gcc/testsuite/gfortran.dg/ldist-1.f90       |    7 +-
>  gcc/testsuite/gfortran.dg/ldist-pr43023.f90 |   31 +++++
>  gcc/tree-data-ref.c                         |   31 +----
>  gcc/tree-data-ref.h                         |   32 +++++
>  gcc/tree-loop-distribution.c                |  191 ++++++++++++---------------
>  20 files changed, 197 insertions(+), 155 deletions(-)
>  create mode 100644 gcc/testsuite/gfortran.dg/ldist-pr43023.f90
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 9b3e799..1d1d3f0 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,23 @@
> +2010-12-09  Sebastian Pop  <sebastian.pop@amd.com>
> +
> +	PR tree-optimization/43023
> +	* tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
> +	Removed.
> +	(stores_zero_from_loop): Call stmt_stores_zero.
> +	* tree-data-ref.h (stmt_stores_zero): New.
> +	* tree-loop-distribution.c (generate_memset_zero): Do not return a
> +	boolean.  Call gcc_assert on stride_of_unit_type_p.
> +	(generate_builtin): Call stmt_stores_zero.
> +	(rdg_flag_all_uses): Removed.
> +	(rdg_flag_similar_memory_accesses): Removed.
> +	(build_rdg_partition_for_component): Removed parameter
> +	other_stores.  Removed call to rdg_flag_similar_memory_accesses.
> +	(can_generate_builtin): New.
> +	(similar_memory_accesses): New.
> +	(fuse_partitions_with_similar_memory_accesses): New.
> +	(rdg_build_partitions): Call
> +	fuse_partitions_with_similar_memory_accesses.
> +
>  2010-12-08  Richard Guenther  <rguenther@suse.de>
>  	    Sebastian Pop  <sebastian.pop@amd.com>
>  
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index c0af4e3..392e2cb 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,9 @@
> +2010-12-09  Sebastian Pop  <sebastian.pop@amd.com>
> +
> +	PR tree-optimization/43023
> +	* gfortran.dg/ldist-1.f90: Adjust pattern.
> +	* gfortran.dg/ldist-pr43023.f90: New.
> +
>  2010-12-08  Richard Guenther  <rguenther@suse.de>
>  	    Sebastian Pop  <sebastian.pop@amd.com>
>  
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c
> index 43c1046..dcff6fb 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */

These seems to be spurious changes.

>  void foo (int * __restrict__ ia,
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c
> index 0790c18..2e69563 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-10.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int loop1 (int k)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c
> index 88651e7..4cc5edc 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-11.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  void foo (int * __restrict__ ia,
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c
> index 1e555fe..dcfe2de 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-12.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int foo (int * __restrict__ ia,
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c
> index 623aacf..2eaa140 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-1a.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int foo (int * __restrict__ ia,
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c
> index de98ccc..d8b2545 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  void foo (int * __restrict__ a,
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c
> index 40adfe1..f2a96d8 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-3.c
> @@ -1,11 +1,11 @@
> -/* { dg-do compile { target int32plus } } */ 
> +/* { dg-do compile { target int32plus } } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int loop1 (int k)
>  {
>    unsigned int i;
>    int a[10000], b[10000], c[10000], d[10000];
> -	
> +
>    a[0] = k; a[3] = k*2;
>    c[1] = k+1;
>    for (i = 2; i < (10000-1); i ++)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c
> index a744fea..7030515 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-4.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int loop1 (int k)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c
> index 9a03dc1..7381b27 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-5.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target int32plus } } */ 
> +/* { dg-do compile { target int32plus } } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int loop1 (int k)
> @@ -9,8 +9,8 @@ int loop1 (int k)
>  
>    a[0][0] = k;
>    for (i = 1; i < 100; i ++)
> -    for (j = 1; j < (100-1); j++) 
> -      {   
> +    for (j = 1; j < (100-1); j++)
> +      {
>          a[i][j] = k * i; /* S1 */
>          b[i][j] = a[i][j-1] + k; /* S2 */
>          c[i][j] = b[i][j] + a[i][j+1]; /* S3 */
> @@ -22,7 +22,7 @@ int loop1 (int k)
>       S2->S3 (flow, level 0)
>       S3->S4 (flow, level 0)
>    */
> -  
> +
>    return a[100-1][100-1] + b[100-1][100-1] + c[100-1][100-1] + d[100-1][100-1];
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
> index 7a38c86..2b21780 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int loop1 (int k)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c
> index 124fcde..39397be 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-7.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int loop1 (int k)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c
> index 4a8e066..f45f70e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-8.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int loop1 (int k)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c
> index ee8d023..9c328a7 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-9.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */ 
> +/* { dg-do compile } */
>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>  
>  int loop1 (int k)
> diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90
> index dd1f02a..315e698 100644
> --- a/gcc/testsuite/gfortran.dg/ldist-1.f90
> +++ b/gcc/testsuite/gfortran.dg/ldist-1.f90
> @@ -1,4 +1,4 @@
> -! { dg-do compile }     
> +! { dg-do compile }
>  ! { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" }
>  
>  Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
> @@ -29,5 +29,8 @@ Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
>    return
>  end Subroutine PADEC
>  
> -! { dg-final { scan-tree-dump-times "distributed: split to 4 loops" 1 "ldist" } }
> +! There are 5 legal partitions in this code.  Based on the data
> +! locality heuristic, this loop should not be split.
> +
> +! { dg-final { scan-tree-dump-not "distributed: split to" "ldist" } }
>  ! { dg-final { cleanup-tree-dump "ldist" } }
> diff --git a/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
> new file mode 100644
> index 0000000..3e2d04c
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
> @@ -0,0 +1,31 @@
> +! { dg-do compile }
> +! { dg-options "-O2 -ftree-loop-distribution" }
> +
> +MODULE NFT_mod
> +
> +implicit none
> +integer :: Nangle
> +real:: Z0
> +real, dimension(:,:), allocatable :: Angle
> +real, dimension(:), allocatable :: exth, ezth, hxth, hyth, hyphi
> +
> +CONTAINS
> +
> +SUBROUTINE NFT_Init()
> +
> +real :: th, fi
> +integer :: n
> +
> +do n = 1,Nangle
> +  th = Angle(n,1)
> +  fi = Angle(n,2)
> +
> +  exth(n) =  cos(fi)*cos(th)
> +  ezth(n) = -sin(th)
> +  hxth(n) = -sin(fi)
> +  hyth(n) =  cos(fi)
> +  hyphi(n) = -sin(fi)
> +end do
> +END SUBROUTINE NFT_Init
> +
> +END MODULE NFT_mod
> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> index 4dfcd5c..3276079 100644
> --- a/gcc/tree-data-ref.c
> +++ b/gcc/tree-data-ref.c
> @@ -4509,7 +4509,7 @@ dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
>      for (e = v->succ; e; e = e->succ_next)
>        fprintf (file, " %d", e->dest);
>  
> -  fprintf (file, ") \n");
> +  fprintf (file, ")\n");
>    print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
>    fprintf (file, ")\n");
>  }
> @@ -4976,27 +4976,6 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
>    free (bbs);
>  }
>  
> -/* Returns true when STMT is an assignment that contains a data
> -   reference on its LHS with a stride of the same size as its unit
> -   type.  */
> -
> -static bool
> -mem_write_stride_of_same_size_as_unit_type_p (gimple stmt)
> -{
> -  struct data_reference *dr = XCNEW (struct data_reference);
> -  tree op0 = gimple_assign_lhs (stmt);
> -  bool res;
> -
> -  DR_STMT (dr) = stmt;
> -  DR_REF (dr) = op0;
> -
> -  res = dr_analyze_innermost (dr)
> -    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
> -
> -  free_data_ref (dr);
> -  return res;
> -}
> -
>  /* Initialize STMTS with all the statements of LOOP that contain a
>     store to memory of the form "A[i] = 0".  */
>  
> @@ -5007,18 +4986,12 @@ stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
>    basic_block bb;
>    gimple_stmt_iterator si;
>    gimple stmt;
> -  tree op;
>    basic_block *bbs = get_loop_body_in_dom_order (loop);
>  
>    for (i = 0; i < loop->num_nodes; i++)
>      for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>        if ((stmt = gsi_stmt (si))
> -	  && gimple_vdef (stmt)
> -	  && is_gimple_assign (stmt)
> -	  && gimple_assign_rhs_code (stmt) == INTEGER_CST
> -	  && (op = gimple_assign_rhs1 (stmt))
> -	  && (integer_zerop (op) || real_zerop (op))
> -	  && mem_write_stride_of_same_size_as_unit_type_p (stmt))
> +	  && stmt_stores_zero (stmt))
>  	VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si));
>  
>    free (bbs);
> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> index d929f31..e1006aa 100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -614,6 +614,38 @@ stride_of_unit_type_p (tree stride, tree type)
>  			     TYPE_SIZE_UNIT (type));
>  }
>  
> +/* Returns true when the statement at STMT is of the form "A[i] = 0"
> +   that contains a data reference on its LHS with a stride of the same
> +   size as its unit type.  */
> +
> +static inline bool
> +stmt_stores_zero (gimple stmt)

stmt_with_adjacent_zero_store_dr_p would be a better name, at
least stmt_stores_zero suggests something simpler.

> +{
> +  tree op0, op1;
> +  bool res;
> +  struct data_reference *dr;
> +
> +  if (!stmt
> +      || !gimple_vdef (stmt)
> +      || !is_gimple_assign (stmt)
> +      || !gimple_assign_single_p (stmt)
> +      || !(op1 = gimple_assign_rhs1 (stmt))
> +      || !(integer_zerop (op1) || real_zerop (op1)))
> +    return false;
> +
> +  dr = XCNEW (struct data_reference);
> +  op0 = gimple_assign_lhs (stmt);
> +
> +  DR_STMT (dr) = stmt;
> +  DR_REF (dr) = op0;
> +
> +  res = dr_analyze_innermost (dr)
> +    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
> +
> +  free_data_ref (dr);
> +  return res;
> +}

Please move this to tree-data-ref.c instead, the function is too
large for inlining anyway.


Ok with that change(s).

Thanks,
Richar.

>  /* Determines whether RDG vertices V1 and V2 access to similar memory
>     locations, in which case they have to be in the same partition.  */
>  
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index 357f51f..4fc246e 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -241,7 +241,7 @@ build_size_arg_loc (location_t loc, tree nb_iter, tree op,
>  
>  /* Generate a call to memset.  Return true when the operation succeeded.  */
>  
> -static bool
> +static void
>  generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>  		      gimple_stmt_iterator bsi)
>  {
> @@ -255,11 +255,8 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>  
>    DR_STMT (dr) = stmt;
>    DR_REF (dr) = op0;
> -  if (!dr_analyze_innermost (dr))
> -    goto end;
> -
> -  if (!stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)))
> -    goto end;
> +  res = dr_analyze_innermost (dr);
> +  gcc_assert (res && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)));
>  
>    nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
>    addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
> @@ -286,14 +283,11 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>    fn_call = gimple_build_call (fn, 3, mem, integer_zero_node, nb_bytes);
>    gimple_seq_add_stmt (&stmt_list, fn_call);
>    gsi_insert_seq_after (&bsi, stmt_list, GSI_CONTINUE_LINKING);
> -  res = true;
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      fprintf (dump_file, "generated memset zero\n");
>  
> - end:
>    free_data_ref (dr);
> -  return res;
>  }
>  
>  /* Tries to generate a builtin function for the instructions of LOOP
> @@ -307,7 +301,6 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
>    unsigned i, x = 0;
>    basic_block *bbs;
>    gimple write = NULL;
> -  tree op0, op1;
>    gimple_stmt_iterator bsi;
>    tree nb_iter = number_of_exit_cond_executions (loop);
>  
> @@ -343,26 +336,17 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
>  	}
>      }
>  
> -  if (!write)
> -    goto end;
> -
> -  op0 = gimple_assign_lhs (write);
> -  op1 = gimple_assign_rhs1 (write);
> -
> -  if (!(TREE_CODE (op0) == ARRAY_REF
> -	|| TREE_CODE (op0) == MEM_REF))
> +  if (!stmt_stores_zero (write))
>      goto end;
>  
>    /* The new statements will be placed before LOOP.  */
>    bsi = gsi_last_bb (loop_preheader_edge (loop)->src);
> -
> -  if (gimple_assign_rhs_code (write) == INTEGER_CST
> -      && (integer_zerop (op1) || real_zerop (op1)))
> -    res = generate_memset_zero (write, op0, nb_iter, bsi);
> +  generate_memset_zero (write, gimple_assign_lhs (write), nb_iter, bsi);
> +  res = true;
>  
>    /* If this is the last partition for which we generate code, we have
>       to destroy the loop.  */
> -  if (res && !copy_p)
> +  if (!copy_p)
>      {
>        unsigned nbbs = loop->num_nodes;
>        edge exit = single_exit (loop);
> @@ -504,24 +488,6 @@ has_upstream_mem_writes (int u)
>  static void rdg_flag_vertex_and_dependent (struct graph *, int, bitmap, bitmap,
>  					   bitmap, bool *);
>  
> -/* Flag all the uses of U.  */
> -
> -static void
> -rdg_flag_all_uses (struct graph *rdg, int u, bitmap partition, bitmap loops,
> -		   bitmap processed, bool *part_has_writes)
> -{
> -  struct graph_edge *e;
> -
> -  for (e = rdg->vertices[u].succ; e; e = e->succ_next)
> -    if (!bitmap_bit_p (processed, e->dest))
> -      {
> -	rdg_flag_vertex_and_dependent (rdg, e->dest, partition, loops,
> -				       processed, part_has_writes);
> -	rdg_flag_all_uses (rdg, e->dest, partition, loops, processed,
> -			   part_has_writes);
> -      }
> -}
> -
>  /* Flag the uses of U stopping following the information from
>     upstream_mem_writes.  */
>  
> @@ -689,68 +655,13 @@ rdg_flag_loop_exits (struct graph *rdg, bitmap loops, bitmap partition,
>      }
>  }
>  
> -/* Flag all the nodes of RDG containing memory accesses that could
> -   potentially belong to arrays already accessed in the current
> -   PARTITION.  */
> -
> -static void
> -rdg_flag_similar_memory_accesses (struct graph *rdg, bitmap partition,
> -				  bitmap loops, bitmap processed,
> -				  VEC (int, heap) **other_stores)
> -{
> -  bool foo;
> -  unsigned i, n;
> -  int j, k, kk;
> -  bitmap_iterator ii;
> -  struct graph_edge *e;
> -
> -  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, ii)
> -    if (RDG_MEM_WRITE_STMT (rdg, i)
> -	|| RDG_MEM_READS_STMT (rdg, i))
> -      {
> -	for (j = 0; j < rdg->n_vertices; j++)
> -	  if (!bitmap_bit_p (processed, j)
> -	      && (RDG_MEM_WRITE_STMT (rdg, j)
> -		  || RDG_MEM_READS_STMT (rdg, j))
> -	      && rdg_has_similar_memory_accesses (rdg, i, j))
> -	    {
> -	      /* Flag first the node J itself, and all the nodes that
> -		 are needed to compute J.  */
> -	      rdg_flag_vertex_and_dependent (rdg, j, partition, loops,
> -					     processed, &foo);
> -
> -	      /* When J is a read, we want to coalesce in the same
> -		 PARTITION all the nodes that are using J: this is
> -		 needed for better cache locality.  */
> -	      rdg_flag_all_uses (rdg, j, partition, loops, processed, &foo);
> -
> -	      /* Remove from OTHER_STORES the vertex that we flagged.  */
> -	      if (RDG_MEM_WRITE_STMT (rdg, j))
> -		FOR_EACH_VEC_ELT (int, *other_stores, k, kk)
> -		  if (kk == j)
> -		    {
> -		      VEC_unordered_remove (int, *other_stores, k);
> -		      break;
> -		    }
> -	    }
> -
> -	/* If the node I has two uses, then keep these together in the
> -	   same PARTITION.  */
> -	for (n = 0, e = rdg->vertices[i].succ; e; e = e->succ_next, n++);
> -
> -	if (n > 1)
> -	  rdg_flag_all_uses (rdg, i, partition, loops, processed, &foo);
> -      }
> -}
> -
>  /* Returns a bitmap in which all the statements needed for computing
>     the strongly connected component C of the RDG are flagged, also
>     including the loop exit conditions.  */
>  
>  static bitmap
>  build_rdg_partition_for_component (struct graph *rdg, rdgc c,
> -				   bool *part_has_writes,
> -				   VEC (int, heap) **other_stores)
> +				   bool *part_has_writes)
>  {
>    int i, v;
>    bitmap partition = BITMAP_ALLOC (NULL);
> @@ -762,14 +673,6 @@ build_rdg_partition_for_component (struct graph *rdg, rdgc c,
>        rdg_flag_vertex_and_dependent (rdg, v, partition, loops, processed,
>  				     part_has_writes);
>  
> -  /* Also iterate on the array of stores not in the starting vertices,
> -     and determine those vertices that have some memory affinity with
> -     the current nodes in the component: these are stores to the same
> -     arrays, i.e. we're taking care of cache locality.  */
> -  if (!flag_tree_loop_distribute_patterns)
> -    rdg_flag_similar_memory_accesses (rdg, partition, loops, processed,
> -				      other_stores);
> -
>    rdg_flag_loop_exits (rdg, loops, partition, processed, part_has_writes);
>  
>    BITMAP_FREE (processed);
> @@ -832,6 +735,79 @@ rdg_build_components (struct graph *rdg, VEC (int, heap) *starting_vertices,
>    BITMAP_FREE (saved_components);
>  }
>  
> +/* Returns true when it is possible to generate a builtin pattern for
> +   the PARTITION of RDG.  For the moment we detect only the memset
> +   zero pattern.  */
> +
> +static bool
> +can_generate_builtin (struct graph *rdg, bitmap partition)
> +{
> +  unsigned i;
> +  bitmap_iterator bi;
> +  int nb_reads = 0;
> +  int nb_writes = 0;
> +  int stores_zero = 0;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, bi)
> +    if (RDG_MEM_READS_STMT (rdg, i))
> +      nb_reads++;
> +    else if (RDG_MEM_WRITE_STMT (rdg, i))
> +      {
> +	nb_writes++;
> +	if (stmt_stores_zero (RDG_STMT (rdg, i)))
> +	  stores_zero++;
> +      }
> +
> +  return stores_zero == 1 && nb_writes == 1 && nb_reads == 0;
> +}
> +
> +/* Returns true when PARTITION1 and PARTITION2 have similar memory
> +   accesses in RDG.  */
> +
> +static bool
> +similar_memory_accesses (struct graph *rdg, bitmap partition1,
> +			 bitmap partition2)
> +{
> +  unsigned i, j;
> +  bitmap_iterator bi, bj;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (partition1, 0, i, bi)
> +    if (RDG_MEM_WRITE_STMT (rdg, i)
> +	|| RDG_MEM_READS_STMT (rdg, i))
> +      EXECUTE_IF_SET_IN_BITMAP (partition2, 0, j, bj)
> +	if (RDG_MEM_WRITE_STMT (rdg, j)
> +	    || RDG_MEM_READS_STMT (rdg, j))
> +	  if (rdg_has_similar_memory_accesses (rdg, i, j))
> +	    return true;
> +
> +  return false;
> +}
> +
> +/* Fuse all the partitions from PARTITIONS that contain similar memory
> +   references, i.e., we're taking care of cache locality.  This
> +   function does not fuse those partitions that contain patterns that
> +   can be code generated with builtins.  */
> +
> +static void
> +fuse_partitions_with_similar_memory_accesses (struct graph *rdg,
> +					      VEC (bitmap, heap) **partitions)
> +{
> +  int p1, p2;
> +  bitmap partition1, partition2;
> +
> +  FOR_EACH_VEC_ELT (bitmap, *partitions, p1, partition1)
> +    if (!can_generate_builtin (rdg, partition1))
> +      FOR_EACH_VEC_ELT (bitmap, *partitions, p2, partition2)
> +	if (p1 != p2
> +	    && !can_generate_builtin (rdg, partition2)
> +	    && similar_memory_accesses (rdg, partition1, partition2))
> +	  {
> +	    bitmap_ior_into (partition1, partition2);
> +	    VEC_ordered_remove (bitmap, *partitions, p2);
> +	    p2--;
> +	  }
> +}
> +
>  /* Aggregate several components into a useful partition that is
>     registered in the PARTITIONS vector.  Partitions will be
>     distributed in different loops.  */
> @@ -854,8 +830,7 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
>        if (bitmap_bit_p (processed, v))
>  	continue;
>  
> -      np = build_rdg_partition_for_component (rdg, x, &part_has_writes,
> -					      other_stores);
> +      np = build_rdg_partition_for_component (rdg, x, &part_has_writes);
>        bitmap_ior_into (partition, np);
>        bitmap_ior_into (processed, np);
>        BITMAP_FREE (np);
> @@ -901,6 +876,8 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
>      VEC_safe_push (bitmap, heap, *partitions, partition);
>    else
>      BITMAP_FREE (partition);
> +
> +  fuse_partitions_with_similar_memory_accesses (rdg, partitions);
>  }
>  
>  /* Dump to FILE the PARTITIONS.  */
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses.
  2010-12-10 13:13 ` Richard Guenther
@ 2010-12-10 19:45   ` Sebastian Pop
  2010-12-10 19:53     ` Sebastian Pop
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Pop @ 2010-12-10 19:45 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1445 bytes --]

On Fri, Dec 10, 2010 at 06:56, Richard Guenther <rguenther@suse.de> wrote:
>> -/* { dg-do compile } */
>> +/* { dg-do compile } */
>>  /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-all" } */
>
> These seems to be spurious changes.
>

I removed these white space changes.

>> +static inline bool
>> +stmt_stores_zero (gimple stmt)
>
> stmt_with_adjacent_zero_store_dr_p would be a better name, at
> least stmt_stores_zero suggests something simpler.
>
>> +{
>> +  tree op0, op1;
>> +  bool res;
>> +  struct data_reference *dr;
>> +
>> +  if (!stmt
>> +      || !gimple_vdef (stmt)
>> +      || !is_gimple_assign (stmt)
>> +      || !gimple_assign_single_p (stmt)
>> +      || !(op1 = gimple_assign_rhs1 (stmt))
>> +      || !(integer_zerop (op1) || real_zerop (op1)))
>> +    return false;
>> +
>> +  dr = XCNEW (struct data_reference);
>> +  op0 = gimple_assign_lhs (stmt);
>> +
>> +  DR_STMT (dr) = stmt;
>> +  DR_REF (dr) = op0;
>> +
>> +  res = dr_analyze_innermost (dr)
>> +    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
>> +
>> +  free_data_ref (dr);
>> +  return res;
>> +}
>
> Please move this to tree-data-ref.c instead, the function is too
> large for inlining anyway.
>
>
> Ok with that change(s).

Done.  See the attached patch that I committed to trunk after regstrap
on amd64-linux.
Should I backport these changes to the 4.5 branch?

Sebastian

[-- Attachment #2: 0001-Fix-PR43023-fuse_partitions_with_similar_memory_acce.patch --]
[-- Type: text/x-diff, Size: 18241 bytes --]

From a136cad6cead220ad9bb1901bbfb81da8ce37cf1 Mon Sep 17 00:00:00 2001
From: spop <spop@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Fri, 10 Dec 2010 19:16:48 +0000
Subject: [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses.

2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/43023
	* tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
	Removed.
	(stores_zero_from_loop): Call stmt_stores_zero.
	* tree-data-ref.h (stmt_stores_zero): New.
	* tree-loop-distribution.c (generate_memset_zero): Do not return a
	boolean.  Call gcc_assert on stride_of_unit_type_p.
	(generate_builtin): Call stmt_stores_zero.
	(rdg_flag_all_uses): Removed.
	(rdg_flag_similar_memory_accesses): Removed.
	(build_rdg_partition_for_component): Removed parameter
	other_stores.  Removed call to rdg_flag_similar_memory_accesses.
	(can_generate_builtin): New.
	(similar_memory_accesses): New.
	(fuse_partitions_with_similar_memory_accesses): New.
	(rdg_build_partitions): Call
	fuse_partitions_with_similar_memory_accesses.

	* gfortran.dg/ldist-1.f90: Adjust pattern.
	* gfortran.dg/ldist-pr43023.f90: New.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@167697 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog                               |   24 +++-
 gcc/testsuite/ChangeLog                     |   14 ++-
 gcc/testsuite/gfortran.dg/ldist-1.f90       |    5 +-
 gcc/testsuite/gfortran.dg/ldist-pr43023.f90 |   31 +++++
 gcc/tree-data-ref.c                         |   35 +++--
 gcc/tree-data-ref.h                         |    1 +
 gcc/tree-loop-distribution.c                |  191 ++++++++++++---------------
 7 files changed, 172 insertions(+), 129 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/ldist-pr43023.f90

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9b4be73..0389806 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,23 @@
+2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/43023
+	* tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
+	Removed.
+	(stores_zero_from_loop): Call stmt_stores_zero.
+	* tree-data-ref.h (stmt_stores_zero): New.
+	* tree-loop-distribution.c (generate_memset_zero): Do not return a
+	boolean.  Call gcc_assert on stride_of_unit_type_p.
+	(generate_builtin): Call stmt_stores_zero.
+	(rdg_flag_all_uses): Removed.
+	(rdg_flag_similar_memory_accesses): Removed.
+	(build_rdg_partition_for_component): Removed parameter
+	other_stores.  Removed call to rdg_flag_similar_memory_accesses.
+	(can_generate_builtin): New.
+	(similar_memory_accesses): New.
+	(fuse_partitions_with_similar_memory_accesses): New.
+	(rdg_build_partitions): Call
+	fuse_partitions_with_similar_memory_accesses.
+
 2010-12-10  Jakub Jelinek  <jakub@redhat.com>
 
 	PR rtl-optimization/46804
@@ -108,9 +128,9 @@
 	(abshi2): Delete.
 	(neghi2, negqi2): Use PDPint iterator.
 	* config/pdp11/pdp11.c (find_addr_reg, output_move_double,
-	output_move_quad): Delete. 
+	output_move_quad): Delete.
 	(pdp11_expand_operands, output_move_multiple): New functions.
-	
+
 2010-12-09  Joseph Myers  <joseph@codesourcery.com>
 
 	* config/vax/linux.h (WCHAR_TYPE, WCHAR_TYPE_SIZE): Define.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 47022af..7bb46f3 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/43023
+	* gfortran.dg/ldist-1.f90: Adjust pattern.
+	* gfortran.dg/ldist-pr43023.f90: New.
+
 2010-12-10  Jakub Jelinek  <jakub@redhat.com>
 
 	PR rtl-optimization/46804
@@ -45,8 +51,8 @@
 	* obj-c++.dg/class-extension-3.mm: New.
 	* obj-c++.dg/property/at-property-26.mm: New.
 	* obj-c++.dg/property/at-property-27.mm: New.
-	* obj-c++.dg/property/at-property-28.mm: New.	
-	
+	* obj-c++.dg/property/at-property-28.mm: New.
+
 2010-12-09  John David Anglin  <dave.anglin@nrc-cnrc.gc.ca>
 
 	PR target/46057
@@ -113,12 +119,12 @@
 	* obj-c++.dg/exceptions-7.mm: New.
 	* obj-c++.dg/exceptions-3.mm: Adjust for new C++ messages.
 	* obj-c++.dg/exceptions-5.mm: Same change.
-	
+
 2010-12-08  Nicola Pero  <nicola.pero@meta-innovation.com>
 
 	* objc.dg/foreach-6.m: Updated location of error messages.
 	* objc.dg/foreach-7.m: Same change.
-	
+
 2010-12-08  Richard Guenther  <rguenther@suse.de>
 	    Sebastian Pop  <sebastian.pop@amd.com>
 
diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90
index dd1f02a..bbce2f3 100644
--- a/gcc/testsuite/gfortran.dg/ldist-1.f90
+++ b/gcc/testsuite/gfortran.dg/ldist-1.f90
@@ -29,5 +29,8 @@ Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
   return
 end Subroutine PADEC
 
-! { dg-final { scan-tree-dump-times "distributed: split to 4 loops" 1 "ldist" } }
+! There are 5 legal partitions in this code.  Based on the data
+! locality heuristic, this loop should not be split.
+
+! { dg-final { scan-tree-dump-not "distributed: split to" "ldist" } }
 ! { dg-final { cleanup-tree-dump "ldist" } }
diff --git a/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
new file mode 100644
index 0000000..3e2d04c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
@@ -0,0 +1,31 @@
+! { dg-do compile }
+! { dg-options "-O2 -ftree-loop-distribution" }
+
+MODULE NFT_mod
+
+implicit none
+integer :: Nangle
+real:: Z0
+real, dimension(:,:), allocatable :: Angle
+real, dimension(:), allocatable :: exth, ezth, hxth, hyth, hyphi
+
+CONTAINS
+
+SUBROUTINE NFT_Init()
+
+real :: th, fi
+integer :: n
+
+do n = 1,Nangle
+  th = Angle(n,1)
+  fi = Angle(n,2)
+
+  exth(n) =  cos(fi)*cos(th)
+  ezth(n) = -sin(th)
+  hxth(n) = -sin(fi)
+  hyth(n) =  cos(fi)
+  hyphi(n) = -sin(fi)
+end do
+END SUBROUTINE NFT_Init
+
+END MODULE NFT_mod
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 4dfcd5c..9a81370 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -4509,7 +4509,7 @@ dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
     for (e = v->succ; e; e = e->succ_next)
       fprintf (file, " %d", e->dest);
 
-  fprintf (file, ") \n");
+  fprintf (file, ")\n");
   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
   fprintf (file, ")\n");
 }
@@ -4976,16 +4976,27 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
   free (bbs);
 }
 
-/* Returns true when STMT is an assignment that contains a data
-   reference on its LHS with a stride of the same size as its unit
-   type.  */
+/* Returns true when the statement at STMT is of the form "A[i] = 0"
+   that contains a data reference on its LHS with a stride of the same
+   size as its unit type.  */
 
-static bool
-mem_write_stride_of_same_size_as_unit_type_p (gimple stmt)
+bool
+stmt_with_adjacent_zero_store_dr_p (gimple stmt)
 {
-  struct data_reference *dr = XCNEW (struct data_reference);
-  tree op0 = gimple_assign_lhs (stmt);
+  tree op0, op1;
   bool res;
+  struct data_reference *dr;
+
+  if (!stmt
+      || !gimple_vdef (stmt)
+      || !is_gimple_assign (stmt)
+      || !gimple_assign_single_p (stmt)
+      || !(op1 = gimple_assign_rhs1 (stmt))
+      || !(integer_zerop (op1) || real_zerop (op1)))
+    return false;
+
+  dr = XCNEW (struct data_reference);
+  op0 = gimple_assign_lhs (stmt);
 
   DR_STMT (dr) = stmt;
   DR_REF (dr) = op0;
@@ -5007,18 +5018,12 @@ stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
   basic_block bb;
   gimple_stmt_iterator si;
   gimple stmt;
-  tree op;
   basic_block *bbs = get_loop_body_in_dom_order (loop);
 
   for (i = 0; i < loop->num_nodes; i++)
     for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
       if ((stmt = gsi_stmt (si))
-	  && gimple_vdef (stmt)
-	  && is_gimple_assign (stmt)
-	  && gimple_assign_rhs_code (stmt) == INTEGER_CST
-	  && (op = gimple_assign_rhs1 (stmt))
-	  && (integer_zerop (op) || real_zerop (op))
-	  && mem_write_stride_of_same_size_as_unit_type_p (stmt))
+	  && stmt_with_adjacent_zero_store_dr_p (stmt))
 	VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si));
 
   free (bbs);
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index d929f31..b4f317f 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -602,6 +602,7 @@ void stores_zero_from_loop (struct loop *, VEC (gimple, heap) **);
 void remove_similar_memory_refs (VEC (gimple, heap) **);
 bool rdg_defs_used_in_other_loops_p (struct graph *, int);
 bool have_similar_memory_accesses (gimple, gimple);
+bool stmt_with_adjacent_zero_store_dr_p (gimple);
 
 /* Returns true when STRIDE is equal in absolute value to the size of
    the unit type of TYPE.  */
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 357f51f..b603209 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -241,7 +241,7 @@ build_size_arg_loc (location_t loc, tree nb_iter, tree op,
 
 /* Generate a call to memset.  Return true when the operation succeeded.  */
 
-static bool
+static void
 generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
 		      gimple_stmt_iterator bsi)
 {
@@ -255,11 +255,8 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
 
   DR_STMT (dr) = stmt;
   DR_REF (dr) = op0;
-  if (!dr_analyze_innermost (dr))
-    goto end;
-
-  if (!stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)))
-    goto end;
+  res = dr_analyze_innermost (dr);
+  gcc_assert (res && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)));
 
   nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
   addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
@@ -286,14 +283,11 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
   fn_call = gimple_build_call (fn, 3, mem, integer_zero_node, nb_bytes);
   gimple_seq_add_stmt (&stmt_list, fn_call);
   gsi_insert_seq_after (&bsi, stmt_list, GSI_CONTINUE_LINKING);
-  res = true;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "generated memset zero\n");
 
- end:
   free_data_ref (dr);
-  return res;
 }
 
 /* Tries to generate a builtin function for the instructions of LOOP
@@ -307,7 +301,6 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
   unsigned i, x = 0;
   basic_block *bbs;
   gimple write = NULL;
-  tree op0, op1;
   gimple_stmt_iterator bsi;
   tree nb_iter = number_of_exit_cond_executions (loop);
 
@@ -343,26 +336,17 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
 	}
     }
 
-  if (!write)
-    goto end;
-
-  op0 = gimple_assign_lhs (write);
-  op1 = gimple_assign_rhs1 (write);
-
-  if (!(TREE_CODE (op0) == ARRAY_REF
-	|| TREE_CODE (op0) == MEM_REF))
+  if (!stmt_with_adjacent_zero_store_dr_p (write))
     goto end;
 
   /* The new statements will be placed before LOOP.  */
   bsi = gsi_last_bb (loop_preheader_edge (loop)->src);
-
-  if (gimple_assign_rhs_code (write) == INTEGER_CST
-      && (integer_zerop (op1) || real_zerop (op1)))
-    res = generate_memset_zero (write, op0, nb_iter, bsi);
+  generate_memset_zero (write, gimple_assign_lhs (write), nb_iter, bsi);
+  res = true;
 
   /* If this is the last partition for which we generate code, we have
      to destroy the loop.  */
-  if (res && !copy_p)
+  if (!copy_p)
     {
       unsigned nbbs = loop->num_nodes;
       edge exit = single_exit (loop);
@@ -504,24 +488,6 @@ has_upstream_mem_writes (int u)
 static void rdg_flag_vertex_and_dependent (struct graph *, int, bitmap, bitmap,
 					   bitmap, bool *);
 
-/* Flag all the uses of U.  */
-
-static void
-rdg_flag_all_uses (struct graph *rdg, int u, bitmap partition, bitmap loops,
-		   bitmap processed, bool *part_has_writes)
-{
-  struct graph_edge *e;
-
-  for (e = rdg->vertices[u].succ; e; e = e->succ_next)
-    if (!bitmap_bit_p (processed, e->dest))
-      {
-	rdg_flag_vertex_and_dependent (rdg, e->dest, partition, loops,
-				       processed, part_has_writes);
-	rdg_flag_all_uses (rdg, e->dest, partition, loops, processed,
-			   part_has_writes);
-      }
-}
-
 /* Flag the uses of U stopping following the information from
    upstream_mem_writes.  */
 
@@ -689,68 +655,13 @@ rdg_flag_loop_exits (struct graph *rdg, bitmap loops, bitmap partition,
     }
 }
 
-/* Flag all the nodes of RDG containing memory accesses that could
-   potentially belong to arrays already accessed in the current
-   PARTITION.  */
-
-static void
-rdg_flag_similar_memory_accesses (struct graph *rdg, bitmap partition,
-				  bitmap loops, bitmap processed,
-				  VEC (int, heap) **other_stores)
-{
-  bool foo;
-  unsigned i, n;
-  int j, k, kk;
-  bitmap_iterator ii;
-  struct graph_edge *e;
-
-  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, ii)
-    if (RDG_MEM_WRITE_STMT (rdg, i)
-	|| RDG_MEM_READS_STMT (rdg, i))
-      {
-	for (j = 0; j < rdg->n_vertices; j++)
-	  if (!bitmap_bit_p (processed, j)
-	      && (RDG_MEM_WRITE_STMT (rdg, j)
-		  || RDG_MEM_READS_STMT (rdg, j))
-	      && rdg_has_similar_memory_accesses (rdg, i, j))
-	    {
-	      /* Flag first the node J itself, and all the nodes that
-		 are needed to compute J.  */
-	      rdg_flag_vertex_and_dependent (rdg, j, partition, loops,
-					     processed, &foo);
-
-	      /* When J is a read, we want to coalesce in the same
-		 PARTITION all the nodes that are using J: this is
-		 needed for better cache locality.  */
-	      rdg_flag_all_uses (rdg, j, partition, loops, processed, &foo);
-
-	      /* Remove from OTHER_STORES the vertex that we flagged.  */
-	      if (RDG_MEM_WRITE_STMT (rdg, j))
-		FOR_EACH_VEC_ELT (int, *other_stores, k, kk)
-		  if (kk == j)
-		    {
-		      VEC_unordered_remove (int, *other_stores, k);
-		      break;
-		    }
-	    }
-
-	/* If the node I has two uses, then keep these together in the
-	   same PARTITION.  */
-	for (n = 0, e = rdg->vertices[i].succ; e; e = e->succ_next, n++);
-
-	if (n > 1)
-	  rdg_flag_all_uses (rdg, i, partition, loops, processed, &foo);
-      }
-}
-
 /* Returns a bitmap in which all the statements needed for computing
    the strongly connected component C of the RDG are flagged, also
    including the loop exit conditions.  */
 
 static bitmap
 build_rdg_partition_for_component (struct graph *rdg, rdgc c,
-				   bool *part_has_writes,
-				   VEC (int, heap) **other_stores)
+				   bool *part_has_writes)
 {
   int i, v;
   bitmap partition = BITMAP_ALLOC (NULL);
@@ -762,14 +673,6 @@ build_rdg_partition_for_component (struct graph *rdg, rdgc c,
       rdg_flag_vertex_and_dependent (rdg, v, partition, loops, processed,
 				     part_has_writes);
 
-  /* Also iterate on the array of stores not in the starting vertices,
-     and determine those vertices that have some memory affinity with
-     the current nodes in the component: these are stores to the same
-     arrays, i.e. we're taking care of cache locality.  */
-  if (!flag_tree_loop_distribute_patterns)
-    rdg_flag_similar_memory_accesses (rdg, partition, loops, processed,
-				      other_stores);
-
   rdg_flag_loop_exits (rdg, loops, partition, processed, part_has_writes);
 
   BITMAP_FREE (processed);
@@ -832,6 +735,79 @@ rdg_build_components (struct graph *rdg, VEC (int, heap) *starting_vertices,
   BITMAP_FREE (saved_components);
 }
 
+/* Returns true when it is possible to generate a builtin pattern for
+   the PARTITION of RDG.  For the moment we detect only the memset
+   zero pattern.  */
+
+static bool
+can_generate_builtin (struct graph *rdg, bitmap partition)
+{
+  unsigned i;
+  bitmap_iterator bi;
+  int nb_reads = 0;
+  int nb_writes = 0;
+  int stores_zero = 0;
+
+  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, bi)
+    if (RDG_MEM_READS_STMT (rdg, i))
+      nb_reads++;
+    else if (RDG_MEM_WRITE_STMT (rdg, i))
+      {
+	nb_writes++;
+	if (stmt_with_adjacent_zero_store_dr_p (RDG_STMT (rdg, i)))
+	  stores_zero++;
+      }
+
+  return stores_zero == 1 && nb_writes == 1 && nb_reads == 0;
+}
+
+/* Returns true when PARTITION1 and PARTITION2 have similar memory
+   accesses in RDG.  */
+
+static bool
+similar_memory_accesses (struct graph *rdg, bitmap partition1,
+			 bitmap partition2)
+{
+  unsigned i, j;
+  bitmap_iterator bi, bj;
+
+  EXECUTE_IF_SET_IN_BITMAP (partition1, 0, i, bi)
+    if (RDG_MEM_WRITE_STMT (rdg, i)
+	|| RDG_MEM_READS_STMT (rdg, i))
+      EXECUTE_IF_SET_IN_BITMAP (partition2, 0, j, bj)
+	if (RDG_MEM_WRITE_STMT (rdg, j)
+	    || RDG_MEM_READS_STMT (rdg, j))
+	  if (rdg_has_similar_memory_accesses (rdg, i, j))
+	    return true;
+
+  return false;
+}
+
+/* Fuse all the partitions from PARTITIONS that contain similar memory
+   references, i.e., we're taking care of cache locality.  This
+   function does not fuse those partitions that contain patterns that
+   can be code generated with builtins.  */
+
+static void
+fuse_partitions_with_similar_memory_accesses (struct graph *rdg,
+					      VEC (bitmap, heap) **partitions)
+{
+  int p1, p2;
+  bitmap partition1, partition2;
+
+  FOR_EACH_VEC_ELT (bitmap, *partitions, p1, partition1)
+    if (!can_generate_builtin (rdg, partition1))
+      FOR_EACH_VEC_ELT (bitmap, *partitions, p2, partition2)
+	if (p1 != p2
+	    && !can_generate_builtin (rdg, partition2)
+	    && similar_memory_accesses (rdg, partition1, partition2))
+	  {
+	    bitmap_ior_into (partition1, partition2);
+	    VEC_ordered_remove (bitmap, *partitions, p2);
+	    p2--;
+	  }
+}
+
 /* Aggregate several components into a useful partition that is
    registered in the PARTITIONS vector.  Partitions will be
    distributed in different loops.  */
@@ -854,8 +830,7 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
       if (bitmap_bit_p (processed, v))
 	continue;
 
-      np = build_rdg_partition_for_component (rdg, x, &part_has_writes,
-					      other_stores);
+      np = build_rdg_partition_for_component (rdg, x, &part_has_writes);
       bitmap_ior_into (partition, np);
       bitmap_ior_into (processed, np);
       BITMAP_FREE (np);
@@ -901,6 +876,8 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
     VEC_safe_push (bitmap, heap, *partitions, partition);
   else
     BITMAP_FREE (partition);
+
+  fuse_partitions_with_similar_memory_accesses (rdg, partitions);
 }
 
 /* Dump to FILE the PARTITIONS.  */
-- 
1.7.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses.
  2010-12-10 19:45   ` Sebastian Pop
@ 2010-12-10 19:53     ` Sebastian Pop
  2010-12-13 21:06       ` Sebastian Pop
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Pop @ 2010-12-10 19:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

Hi,

here is the backport of the same patch for the gcc4.5 branch.  Please
let me know when you want to commit this patch to the branch.  For now
I sent this out for test on amd64-linux.

Sebastian

2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/43023
	* tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
	Removed.
	(stores_zero_from_loop): Call stmt_stores_zero.
	(stmt_with_adjacent_zero_store_dr_p): New.
	* tree-data-ref.h (stmt_with_adjacent_zero_store_dr_p): Declared.
	(stride_of_unit_type_p): New.
	* tree-loop-distribution.c (generate_memset_zero): Do not return a
	boolean.  Call gcc_assert on stride_of_unit_type_p.
	(generate_builtin): Call stmt_stores_zero.
	(rdg_flag_all_uses): Removed.
	(rdg_flag_similar_memory_accesses): Removed.
	(build_rdg_partition_for_component): Removed parameter
	other_stores.  Removed call to rdg_flag_similar_memory_accesses.
	(can_generate_builtin): New.
	(similar_memory_accesses): New.
	(fuse_partitions_with_similar_memory_accesses): New.
	(rdg_build_partitions): Call
	fuse_partitions_with_similar_memory_accesses.

	* gfortran.dg/ldist-1.f90: Adjust pattern.
	* gfortran.dg/ldist-pr43023.f90: New.
---
 gcc/ChangeLog                               |   22 +++
 gcc/testsuite/ChangeLog                     |    6 +
 gcc/testsuite/gfortran.dg/ldist-1.f90       |    5 +-
 gcc/testsuite/gfortran.dg/ldist-pr43023.f90 |   31 ++++
 gcc/tree-data-ref.c                         |   34 ++++-
 gcc/tree-data-ref.h                         |   12 ++
 gcc/tree-loop-distribution.c                |  223 +++++++++++----------------
 7 files changed, 201 insertions(+), 132 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/ldist-pr43023.f90

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index abecb83..dfe45ed 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,25 @@
+2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/43023
+	* tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
+	Removed.
+	(stores_zero_from_loop): Call stmt_stores_zero.
+	(stmt_with_adjacent_zero_store_dr_p): New.
+	* tree-data-ref.h (stmt_with_adjacent_zero_store_dr_p): Declared.
+	(stride_of_unit_type_p): New.
+	* tree-loop-distribution.c (generate_memset_zero): Do not return a
+	boolean.  Call gcc_assert on stride_of_unit_type_p.
+	(generate_builtin): Call stmt_stores_zero.
+	(rdg_flag_all_uses): Removed.
+	(rdg_flag_similar_memory_accesses): Removed.
+	(build_rdg_partition_for_component): Removed parameter
+	other_stores.  Removed call to rdg_flag_similar_memory_accesses.
+	(can_generate_builtin): New.
+	(similar_memory_accesses): New.
+	(fuse_partitions_with_similar_memory_accesses): New.
+	(rdg_build_partitions): Call
+	fuse_partitions_with_similar_memory_accesses.
+
 2010-12-07  Jakub Jelinek  <jakub@redhat.com>
 
 	Backport from mainline
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index e3a9d24..36fd59d 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/43023
+	* gfortran.dg/ldist-1.f90: Adjust pattern.
+	* gfortran.dg/ldist-pr43023.f90: New.
+
 2010-12-07  Sebastian Pop  <sebastian.pop@amd.com>
 
 	PR tree-optimization/44676
diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90
index dd1f02a..bbce2f3 100644
--- a/gcc/testsuite/gfortran.dg/ldist-1.f90
+++ b/gcc/testsuite/gfortran.dg/ldist-1.f90
@@ -29,5 +29,8 @@ Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
   return
 end Subroutine PADEC
 
-! { dg-final { scan-tree-dump-times "distributed: split to 4 loops" 1 "ldist" } }
+! There are 5 legal partitions in this code.  Based on the data
+! locality heuristic, this loop should not be split.
+
+! { dg-final { scan-tree-dump-not "distributed: split to" "ldist" } }
 ! { dg-final { cleanup-tree-dump "ldist" } }
diff --git a/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
new file mode 100644
index 0000000..3e2d04c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
@@ -0,0 +1,31 @@
+! { dg-do compile }
+! { dg-options "-O2 -ftree-loop-distribution" }
+
+MODULE NFT_mod
+
+implicit none
+integer :: Nangle
+real:: Z0
+real, dimension(:,:), allocatable :: Angle
+real, dimension(:), allocatable :: exth, ezth, hxth, hyth, hyphi
+
+CONTAINS
+
+SUBROUTINE NFT_Init()
+
+real :: th, fi
+integer :: n
+
+do n = 1,Nangle
+  th = Angle(n,1)
+  fi = Angle(n,2)
+
+  exth(n) =  cos(fi)*cos(th)
+  ezth(n) = -sin(th)
+  hxth(n) = -sin(fi)
+  hyth(n) =  cos(fi)
+  hyphi(n) = -sin(fi)
+end do
+END SUBROUTINE NFT_Init
+
+END MODULE NFT_mod
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a89d151..dab0177 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -4594,7 +4594,7 @@ dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
     for (e = v->succ; e; e = e->succ_next)
       fprintf (file, " %d", e->dest);
 
-  fprintf (file, ") \n");
+  fprintf (file, ")\n");
   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
   fprintf (file, ")\n");
 }
@@ -4991,6 +4991,38 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
   free (bbs);
 }
 
+/* Returns true when the statement at STMT is of the form "A[i] = 0"
+   that contains a data reference on its LHS with a stride of the same
+   size as its unit type.  */
+
+bool
+stmt_with_adjacent_zero_store_dr_p (gimple stmt)
+{
+  tree op0, op1;
+  bool res;
+  struct data_reference *dr;
+
+  if (!stmt
+      || !gimple_vdef (stmt)
+      || !is_gimple_assign (stmt)
+      || !gimple_assign_single_p (stmt)
+      || !(op1 = gimple_assign_rhs1 (stmt))
+      || !(integer_zerop (op1) || real_zerop (op1)))
+    return false;
+
+  dr = XCNEW (struct data_reference);
+  op0 = gimple_assign_lhs (stmt);
+
+  DR_STMT (dr) = stmt;
+  DR_REF (dr) = op0;
+
+  res = dr_analyze_innermost (dr)
+    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
+
+  free_data_ref (dr);
+  return res;
+}
+
 /* For a data reference REF, return the declaration of its base
    address or NULL_TREE if the base is not determined.  */
 
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 678eb10..90cbca1 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -567,6 +567,18 @@ void stores_from_loop (struct loop *, VEC (gimple, heap) **);
 void remove_similar_memory_refs (VEC (gimple, heap) **);
 bool rdg_defs_used_in_other_loops_p (struct graph *, int);
 bool have_similar_memory_accesses (gimple, gimple);
+bool stmt_with_adjacent_zero_store_dr_p (gimple);
+
+/* Returns true when STRIDE is equal in absolute value to the size of
+   the unit type of TYPE.  */
+
+static inline bool
+stride_of_unit_type_p (tree stride, tree type)
+{
+  return tree_int_cst_equal (fold_unary (ABS_EXPR, TREE_TYPE (stride),
+					 stride),
+			     TYPE_SIZE_UNIT (type));
+}
 
 /* Determines whether RDG vertices V1 and V2 access to similar memory
    locations, in which case they have to be in the same partition.  */
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index a328860..2e420ea 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -251,7 +251,7 @@ build_size_arg_loc (location_t loc, tree nb_iter, tree op,
 
 /* Generate a call to memset.  Return true when the operation succeeded.  */
 
-static bool
+static void
 generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
 		      gimple_stmt_iterator bsi)
 {
@@ -265,45 +265,27 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
 
   DR_STMT (dr) = stmt;
   DR_REF (dr) = op0;
-  if (!dr_analyze_innermost (dr))
-    goto end;
+  res = dr_analyze_innermost (dr);
+  gcc_assert (res && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)));
 
-  /* Test for a positive stride, iterating over every element.  */
-  if (integer_zerop (size_binop (MINUS_EXPR,
-				 fold_convert (sizetype, DR_STEP (dr)),
-				 TYPE_SIZE_UNIT (TREE_TYPE (op0)))))
-    {
-      addr_base = fold_convert_loc (loc, sizetype,
-				    size_binop_loc (loc, PLUS_EXPR,
-						    DR_OFFSET (dr),
-						    DR_INIT (dr)));
-      addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
-				   TREE_TYPE (DR_BASE_ADDRESS (dr)),
-				   DR_BASE_ADDRESS (dr), addr_base);
-
-      nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
-    }
+  nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
+  addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
+  addr_base = fold_convert_loc (loc, sizetype, addr_base);
 
   /* Test for a negative stride, iterating over every element.  */
-  else if (integer_zerop (size_binop (PLUS_EXPR,
-				      TYPE_SIZE_UNIT (TREE_TYPE (op0)),
-				      fold_convert (sizetype, DR_STEP (dr)))))
+  if (integer_zerop (size_binop (PLUS_EXPR,
+				 TYPE_SIZE_UNIT (TREE_TYPE (op0)),
+				 fold_convert (sizetype, DR_STEP (dr)))))
     {
-      nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
-
-      addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
-      addr_base = fold_convert_loc (loc, sizetype, addr_base);
       addr_base = size_binop_loc (loc, MINUS_EXPR, addr_base,
 				  fold_convert_loc (loc, sizetype, nb_bytes));
       addr_base = size_binop_loc (loc, PLUS_EXPR, addr_base,
 				  TYPE_SIZE_UNIT (TREE_TYPE (op0)));
-      addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
-				   TREE_TYPE (DR_BASE_ADDRESS (dr)),
-				   DR_BASE_ADDRESS (dr), addr_base);
     }
-  else
-    goto end;
 
+  addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
+			       TREE_TYPE (DR_BASE_ADDRESS (dr)),
+			       DR_BASE_ADDRESS (dr), addr_base);
   mem = force_gimple_operand (addr_base, &stmts, true, NULL);
   gimple_seq_add_seq (&stmt_list, stmts);
 
@@ -311,14 +293,11 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
   fn_call = gimple_build_call (fn, 3, mem, integer_zero_node, nb_bytes);
   gimple_seq_add_stmt (&stmt_list, fn_call);
   gsi_insert_seq_after (&bsi, stmt_list, GSI_CONTINUE_LINKING);
-  res = true;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "generated memset zero\n");
 
- end:
   free_data_ref (dr);
-  return res;
 }
 
 /* Tries to generate a builtin function for the instructions of LOOP
@@ -332,7 +311,6 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
   unsigned i, x = 0;
   basic_block *bbs;
   gimple write = NULL;
-  tree op0, op1;
   gimple_stmt_iterator bsi;
   tree nb_iter = number_of_exit_cond_executions (loop);
 
@@ -368,26 +346,17 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
 	}
     }
 
-  if (!write)
-    goto end;
-
-  op0 = gimple_assign_lhs (write);
-  op1 = gimple_assign_rhs1 (write);
-
-  if (!(TREE_CODE (op0) == ARRAY_REF
-	|| TREE_CODE (op0) == INDIRECT_REF))
+  if (!stmt_with_adjacent_zero_store_dr_p (write))
     goto end;
 
   /* The new statements will be placed before LOOP.  */
   bsi = gsi_last_bb (loop_preheader_edge (loop)->src);
-
-  if (gimple_assign_rhs_code (write) == INTEGER_CST
-      && (integer_zerop (op1) || real_zerop (op1)))
-    res = generate_memset_zero (write, op0, nb_iter, bsi);
+  generate_memset_zero (write, gimple_assign_lhs (write), nb_iter, bsi);
+  res = true;
 
   /* If this is the last partition for which we generate code, we have
      to destroy the loop.  */
-  if (res && !copy_p)
+  if (!copy_p)
     {
       unsigned nbbs = loop->num_nodes;
       edge exit = single_exit (loop);
@@ -531,24 +500,6 @@ has_upstream_mem_writes (int u)
 static void rdg_flag_vertex_and_dependent (struct graph *, int, bitmap, bitmap,
 					   bitmap, bool *);
 
-/* Flag all the uses of U.  */
-
-static void
-rdg_flag_all_uses (struct graph *rdg, int u, bitmap partition, bitmap loops,
-		   bitmap processed, bool *part_has_writes)
-{
-  struct graph_edge *e;
-
-  for (e = rdg->vertices[u].succ; e; e = e->succ_next)
-    if (!bitmap_bit_p (processed, e->dest))
-      {
-	rdg_flag_vertex_and_dependent (rdg, e->dest, partition, loops,
-				       processed, part_has_writes);
-	rdg_flag_all_uses (rdg, e->dest, partition, loops, processed,
-			   part_has_writes);
-      }
-}
-
 /* Flag the uses of U stopping following the information from
    upstream_mem_writes.  */
 
@@ -720,68 +671,13 @@ rdg_flag_loop_exits (struct graph *rdg, bitmap loops, bitmap partition,
     }
 }
 
-/* Flag all the nodes of RDG containing memory accesses that could
-   potentially belong to arrays already accessed in the current
-   PARTITION.  */
-
-static void
-rdg_flag_similar_memory_accesses (struct graph *rdg, bitmap partition,
-				  bitmap loops, bitmap processed,
-				  VEC (int, heap) **other_stores)
-{
-  bool foo;
-  unsigned i, n;
-  int j, k, kk;
-  bitmap_iterator ii;
-  struct graph_edge *e;
-
-  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, ii)
-    if (RDG_MEM_WRITE_STMT (rdg, i)
-	|| RDG_MEM_READS_STMT (rdg, i))
-      {
-	for (j = 0; j < rdg->n_vertices; j++)
-	  if (!bitmap_bit_p (processed, j)
-	      && (RDG_MEM_WRITE_STMT (rdg, j)
-		  || RDG_MEM_READS_STMT (rdg, j))
-	      && rdg_has_similar_memory_accesses (rdg, i, j))
-	    {
-	      /* Flag first the node J itself, and all the nodes that
-		 are needed to compute J.  */
-	      rdg_flag_vertex_and_dependent (rdg, j, partition, loops,
-					     processed, &foo);
-
-	      /* When J is a read, we want to coalesce in the same
-		 PARTITION all the nodes that are using J: this is
-		 needed for better cache locality.  */
-	      rdg_flag_all_uses (rdg, j, partition, loops, processed, &foo);
-
-	      /* Remove from OTHER_STORES the vertex that we flagged.  */
-	      if (RDG_MEM_WRITE_STMT (rdg, j))
-		for (k = 0; VEC_iterate (int, *other_stores, k, kk); k++)
-		  if (kk == j)
-		    {
-		      VEC_unordered_remove (int, *other_stores, k);
-		      break;
-		    }
-	    }
-
-	/* If the node I has two uses, then keep these together in the
-	   same PARTITION.  */
-	for (n = 0, e = rdg->vertices[i].succ; e; e = e->succ_next, n++);
-
-	if (n > 1)
-	  rdg_flag_all_uses (rdg, i, partition, loops, processed, &foo);
-      }
-}
-
 /* Returns a bitmap in which all the statements needed for computing
    the strongly connected component C of the RDG are flagged, also
    including the loop exit conditions.  */
 
 static bitmap
 build_rdg_partition_for_component (struct graph *rdg, rdgc c,
-				   bool *part_has_writes,
-				   VEC (int, heap) **other_stores)
+				   bool *part_has_writes)
 {
   int i, v;
   bitmap partition = BITMAP_ALLOC (NULL);
@@ -793,13 +689,6 @@ build_rdg_partition_for_component (struct graph *rdg, rdgc c,
       rdg_flag_vertex_and_dependent (rdg, v, partition, loops, processed,
 				     part_has_writes);
 
-  /* Also iterate on the array of stores not in the starting vertices,
-     and determine those vertices that have some memory affinity with
-     the current nodes in the component: these are stores to the same
-     arrays, i.e. we're taking care of cache locality.  */
-  rdg_flag_similar_memory_accesses (rdg, partition, loops, processed,
-				    other_stores);
-
   rdg_flag_loop_exits (rdg, loops, partition, processed, part_has_writes);
 
   BITMAP_FREE (processed);
@@ -863,6 +752,79 @@ rdg_build_components (struct graph *rdg, VEC (int, heap) *starting_vertices,
   BITMAP_FREE (saved_components);
 }
 
+/* Returns true when it is possible to generate a builtin pattern for
+   the PARTITION of RDG.  For the moment we detect only the memset
+   zero pattern.  */
+
+static bool
+can_generate_builtin (struct graph *rdg, bitmap partition)
+{
+  unsigned i;
+  bitmap_iterator bi;
+  int nb_reads = 0;
+  int nb_writes = 0;
+  int stores_zero = 0;
+
+  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, bi)
+    if (RDG_MEM_READS_STMT (rdg, i))
+      nb_reads++;
+    else if (RDG_MEM_WRITE_STMT (rdg, i))
+      {
+	nb_writes++;
+	if (stmt_with_adjacent_zero_store_dr_p (RDG_STMT (rdg, i)))
+	  stores_zero++;
+      }
+
+  return stores_zero == 1 && nb_writes == 1 && nb_reads == 0;
+}
+
+/* Returns true when PARTITION1 and PARTITION2 have similar memory
+   accesses in RDG.  */
+
+static bool
+similar_memory_accesses (struct graph *rdg, bitmap partition1,
+			 bitmap partition2)
+{
+  unsigned i, j;
+  bitmap_iterator bi, bj;
+
+  EXECUTE_IF_SET_IN_BITMAP (partition1, 0, i, bi)
+    if (RDG_MEM_WRITE_STMT (rdg, i)
+	|| RDG_MEM_READS_STMT (rdg, i))
+      EXECUTE_IF_SET_IN_BITMAP (partition2, 0, j, bj)
+	if (RDG_MEM_WRITE_STMT (rdg, j)
+	    || RDG_MEM_READS_STMT (rdg, j))
+	  if (rdg_has_similar_memory_accesses (rdg, i, j))
+	    return true;
+
+  return false;
+}
+
+/* Fuse all the partitions from PARTITIONS that contain similar memory
+   references, i.e., we're taking care of cache locality.  This
+   function does not fuse those partitions that contain patterns that
+   can be code generated with builtins.  */
+
+static void
+fuse_partitions_with_similar_memory_accesses (struct graph *rdg,
+					      VEC (bitmap, heap) **partitions)
+{
+  int p1, p2;
+  bitmap partition1, partition2;
+
+  for (p1 = 0; VEC_iterate (bitmap, *partitions, p1, partition1); p1++)
+    if (!can_generate_builtin (rdg, partition1))
+      for (p2 = 0; VEC_iterate (bitmap, *partitions, p2, partition2); p2++)
+	if (p1 != p2
+	    && !can_generate_builtin (rdg, partition2)
+	    && similar_memory_accesses (rdg, partition1, partition2))
+	  {
+	    bitmap_ior_into (partition1, partition2);
+	    VEC_ordered_remove (bitmap, *partitions, p2);
+	    p2--;
+	  }
+}
+
 /* Aggregate several components into a useful partition that is
    registered in the PARTITIONS vector.  Partitions will be
    distributed in different loops.  */
@@ -885,8 +847,7 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
       if (bitmap_bit_p (processed, v))
 	continue;
 
-      np = build_rdg_partition_for_component (rdg, x, &part_has_writes,
-					      other_stores);
+      np = build_rdg_partition_for_component (rdg, x, &part_has_writes);
       bitmap_ior_into (partition, np);
       bitmap_ior_into (processed, np);
       BITMAP_FREE (np);
@@ -932,6 +893,8 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
     VEC_safe_push (bitmap, heap, *partitions, partition);
   else
     BITMAP_FREE (partition);
+
+  fuse_partitions_with_similar_memory_accesses (rdg, partitions);
 }
 
 /* Dump to FILE the PARTITIONS.  */
-- 
1.7.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses.
  2010-12-10 19:53     ` Sebastian Pop
@ 2010-12-13 21:06       ` Sebastian Pop
  2010-12-16 21:56         ` Sebastian Pop
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Pop @ 2010-12-13 21:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

On Fri, Dec 10, 2010 at 13:45, Sebastian Pop <sebpop@gmail.com> wrote:
> Hi,
>
> here is the backport of the same patch for the gcc4.5 branch.  Please
> let me know when you want to commit this patch to the branch.  For now
> I sent this out for test on amd64-linux.

This patch passed regression test.

Sebastian

>
> Sebastian
>
> 2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
>
>        PR tree-optimization/43023
>        * tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
>        Removed.
>        (stores_zero_from_loop): Call stmt_stores_zero.
>        (stmt_with_adjacent_zero_store_dr_p): New.
>        * tree-data-ref.h (stmt_with_adjacent_zero_store_dr_p): Declared.
>        (stride_of_unit_type_p): New.
>        * tree-loop-distribution.c (generate_memset_zero): Do not return a
>        boolean.  Call gcc_assert on stride_of_unit_type_p.
>        (generate_builtin): Call stmt_stores_zero.
>        (rdg_flag_all_uses): Removed.
>        (rdg_flag_similar_memory_accesses): Removed.
>        (build_rdg_partition_for_component): Removed parameter
>        other_stores.  Removed call to rdg_flag_similar_memory_accesses.
>        (can_generate_builtin): New.
>        (similar_memory_accesses): New.
>        (fuse_partitions_with_similar_memory_accesses): New.
>        (rdg_build_partitions): Call
>        fuse_partitions_with_similar_memory_accesses.
>
>        * gfortran.dg/ldist-1.f90: Adjust pattern.
>        * gfortran.dg/ldist-pr43023.f90: New.
> ---
>  gcc/ChangeLog                               |   22 +++
>  gcc/testsuite/ChangeLog                     |    6 +
>  gcc/testsuite/gfortran.dg/ldist-1.f90       |    5 +-
>  gcc/testsuite/gfortran.dg/ldist-pr43023.f90 |   31 ++++
>  gcc/tree-data-ref.c                         |   34 ++++-
>  gcc/tree-data-ref.h                         |   12 ++
>  gcc/tree-loop-distribution.c                |  223 +++++++++++----------------
>  7 files changed, 201 insertions(+), 132 deletions(-)
>  create mode 100644 gcc/testsuite/gfortran.dg/ldist-pr43023.f90
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index abecb83..dfe45ed 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,25 @@
> +2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
> +
> +       PR tree-optimization/43023
> +       * tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
> +       Removed.
> +       (stores_zero_from_loop): Call stmt_stores_zero.
> +       (stmt_with_adjacent_zero_store_dr_p): New.
> +       * tree-data-ref.h (stmt_with_adjacent_zero_store_dr_p): Declared.
> +       (stride_of_unit_type_p): New.
> +       * tree-loop-distribution.c (generate_memset_zero): Do not return a
> +       boolean.  Call gcc_assert on stride_of_unit_type_p.
> +       (generate_builtin): Call stmt_stores_zero.
> +       (rdg_flag_all_uses): Removed.
> +       (rdg_flag_similar_memory_accesses): Removed.
> +       (build_rdg_partition_for_component): Removed parameter
> +       other_stores.  Removed call to rdg_flag_similar_memory_accesses.
> +       (can_generate_builtin): New.
> +       (similar_memory_accesses): New.
> +       (fuse_partitions_with_similar_memory_accesses): New.
> +       (rdg_build_partitions): Call
> +       fuse_partitions_with_similar_memory_accesses.
> +
>  2010-12-07  Jakub Jelinek  <jakub@redhat.com>
>
>        Backport from mainline
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index e3a9d24..36fd59d 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,9 @@
> +2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
> +
> +       PR tree-optimization/43023
> +       * gfortran.dg/ldist-1.f90: Adjust pattern.
> +       * gfortran.dg/ldist-pr43023.f90: New.
> +
>  2010-12-07  Sebastian Pop  <sebastian.pop@amd.com>
>
>        PR tree-optimization/44676
> diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90
> index dd1f02a..bbce2f3 100644
> --- a/gcc/testsuite/gfortran.dg/ldist-1.f90
> +++ b/gcc/testsuite/gfortran.dg/ldist-1.f90
> @@ -29,5 +29,8 @@ Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
>   return
>  end Subroutine PADEC
>
> -! { dg-final { scan-tree-dump-times "distributed: split to 4 loops" 1 "ldist" } }
> +! There are 5 legal partitions in this code.  Based on the data
> +! locality heuristic, this loop should not be split.
> +
> +! { dg-final { scan-tree-dump-not "distributed: split to" "ldist" } }
>  ! { dg-final { cleanup-tree-dump "ldist" } }
> diff --git a/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
> new file mode 100644
> index 0000000..3e2d04c
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
> @@ -0,0 +1,31 @@
> +! { dg-do compile }
> +! { dg-options "-O2 -ftree-loop-distribution" }
> +
> +MODULE NFT_mod
> +
> +implicit none
> +integer :: Nangle
> +real:: Z0
> +real, dimension(:,:), allocatable :: Angle
> +real, dimension(:), allocatable :: exth, ezth, hxth, hyth, hyphi
> +
> +CONTAINS
> +
> +SUBROUTINE NFT_Init()
> +
> +real :: th, fi
> +integer :: n
> +
> +do n = 1,Nangle
> +  th = Angle(n,1)
> +  fi = Angle(n,2)
> +
> +  exth(n) =  cos(fi)*cos(th)
> +  ezth(n) = -sin(th)
> +  hxth(n) = -sin(fi)
> +  hyth(n) =  cos(fi)
> +  hyphi(n) = -sin(fi)
> +end do
> +END SUBROUTINE NFT_Init
> +
> +END MODULE NFT_mod
> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> index a89d151..dab0177 100644
> --- a/gcc/tree-data-ref.c
> +++ b/gcc/tree-data-ref.c
> @@ -4594,7 +4594,7 @@ dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
>     for (e = v->succ; e; e = e->succ_next)
>       fprintf (file, " %d", e->dest);
>
> -  fprintf (file, ") \n");
> +  fprintf (file, ")\n");
>   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
>   fprintf (file, ")\n");
>  }
> @@ -4991,6 +4991,38 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
>   free (bbs);
>  }
>
> +/* Returns true when the statement at STMT is of the form "A[i] = 0"
> +   that contains a data reference on its LHS with a stride of the same
> +   size as its unit type.  */
> +
> +bool
> +stmt_with_adjacent_zero_store_dr_p (gimple stmt)
> +{
> +  tree op0, op1;
> +  bool res;
> +  struct data_reference *dr;
> +
> +  if (!stmt
> +      || !gimple_vdef (stmt)
> +      || !is_gimple_assign (stmt)
> +      || !gimple_assign_single_p (stmt)
> +      || !(op1 = gimple_assign_rhs1 (stmt))
> +      || !(integer_zerop (op1) || real_zerop (op1)))
> +    return false;
> +
> +  dr = XCNEW (struct data_reference);
> +  op0 = gimple_assign_lhs (stmt);
> +
> +  DR_STMT (dr) = stmt;
> +  DR_REF (dr) = op0;
> +
> +  res = dr_analyze_innermost (dr)
> +    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
> +
> +  free_data_ref (dr);
> +  return res;
> +}
> +
>  /* For a data reference REF, return the declaration of its base
>    address or NULL_TREE if the base is not determined.  */
>
> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> index 678eb10..90cbca1 100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -567,6 +567,18 @@ void stores_from_loop (struct loop *, VEC (gimple, heap) **);
>  void remove_similar_memory_refs (VEC (gimple, heap) **);
>  bool rdg_defs_used_in_other_loops_p (struct graph *, int);
>  bool have_similar_memory_accesses (gimple, gimple);
> +bool stmt_with_adjacent_zero_store_dr_p (gimple);
> +
> +/* Returns true when STRIDE is equal in absolute value to the size of
> +   the unit type of TYPE.  */
> +
> +static inline bool
> +stride_of_unit_type_p (tree stride, tree type)
> +{
> +  return tree_int_cst_equal (fold_unary (ABS_EXPR, TREE_TYPE (stride),
> +                                        stride),
> +                            TYPE_SIZE_UNIT (type));
> +}
>
>  /* Determines whether RDG vertices V1 and V2 access to similar memory
>    locations, in which case they have to be in the same partition.  */
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index a328860..2e420ea 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -251,7 +251,7 @@ build_size_arg_loc (location_t loc, tree nb_iter, tree op,
>
>  /* Generate a call to memset.  Return true when the operation succeeded.  */
>
> -static bool
> +static void
>  generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>                      gimple_stmt_iterator bsi)
>  {
> @@ -265,45 +265,27 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>
>   DR_STMT (dr) = stmt;
>   DR_REF (dr) = op0;
> -  if (!dr_analyze_innermost (dr))
> -    goto end;
> +  res = dr_analyze_innermost (dr);
> +  gcc_assert (res && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)));
>
> -  /* Test for a positive stride, iterating over every element.  */
> -  if (integer_zerop (size_binop (MINUS_EXPR,
> -                                fold_convert (sizetype, DR_STEP (dr)),
> -                                TYPE_SIZE_UNIT (TREE_TYPE (op0)))))
> -    {
> -      addr_base = fold_convert_loc (loc, sizetype,
> -                                   size_binop_loc (loc, PLUS_EXPR,
> -                                                   DR_OFFSET (dr),
> -                                                   DR_INIT (dr)));
> -      addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
> -                                  TREE_TYPE (DR_BASE_ADDRESS (dr)),
> -                                  DR_BASE_ADDRESS (dr), addr_base);
> -
> -      nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
> -    }
> +  nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
> +  addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
> +  addr_base = fold_convert_loc (loc, sizetype, addr_base);
>
>   /* Test for a negative stride, iterating over every element.  */
> -  else if (integer_zerop (size_binop (PLUS_EXPR,
> -                                     TYPE_SIZE_UNIT (TREE_TYPE (op0)),
> -                                     fold_convert (sizetype, DR_STEP (dr)))))
> +  if (integer_zerop (size_binop (PLUS_EXPR,
> +                                TYPE_SIZE_UNIT (TREE_TYPE (op0)),
> +                                fold_convert (sizetype, DR_STEP (dr)))))
>     {
> -      nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
> -
> -      addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
> -      addr_base = fold_convert_loc (loc, sizetype, addr_base);
>       addr_base = size_binop_loc (loc, MINUS_EXPR, addr_base,
>                                  fold_convert_loc (loc, sizetype, nb_bytes));
>       addr_base = size_binop_loc (loc, PLUS_EXPR, addr_base,
>                                  TYPE_SIZE_UNIT (TREE_TYPE (op0)));
> -      addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
> -                                  TREE_TYPE (DR_BASE_ADDRESS (dr)),
> -                                  DR_BASE_ADDRESS (dr), addr_base);
>     }
> -  else
> -    goto end;
>
> +  addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
> +                              TREE_TYPE (DR_BASE_ADDRESS (dr)),
> +                              DR_BASE_ADDRESS (dr), addr_base);
>   mem = force_gimple_operand (addr_base, &stmts, true, NULL);
>   gimple_seq_add_seq (&stmt_list, stmts);
>
> @@ -311,14 +293,11 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>   fn_call = gimple_build_call (fn, 3, mem, integer_zero_node, nb_bytes);
>   gimple_seq_add_stmt (&stmt_list, fn_call);
>   gsi_insert_seq_after (&bsi, stmt_list, GSI_CONTINUE_LINKING);
> -  res = true;
>
>   if (dump_file && (dump_flags & TDF_DETAILS))
>     fprintf (dump_file, "generated memset zero\n");
>
> - end:
>   free_data_ref (dr);
> -  return res;
>  }
>
>  /* Tries to generate a builtin function for the instructions of LOOP
> @@ -332,7 +311,6 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
>   unsigned i, x = 0;
>   basic_block *bbs;
>   gimple write = NULL;
> -  tree op0, op1;
>   gimple_stmt_iterator bsi;
>   tree nb_iter = number_of_exit_cond_executions (loop);
>
> @@ -368,26 +346,17 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
>        }
>     }
>
> -  if (!write)
> -    goto end;
> -
> -  op0 = gimple_assign_lhs (write);
> -  op1 = gimple_assign_rhs1 (write);
> -
> -  if (!(TREE_CODE (op0) == ARRAY_REF
> -       || TREE_CODE (op0) == INDIRECT_REF))
> +  if (!stmt_with_adjacent_zero_store_dr_p (write))
>     goto end;
>
>   /* The new statements will be placed before LOOP.  */
>   bsi = gsi_last_bb (loop_preheader_edge (loop)->src);
> -
> -  if (gimple_assign_rhs_code (write) == INTEGER_CST
> -      && (integer_zerop (op1) || real_zerop (op1)))
> -    res = generate_memset_zero (write, op0, nb_iter, bsi);
> +  generate_memset_zero (write, gimple_assign_lhs (write), nb_iter, bsi);
> +  res = true;
>
>   /* If this is the last partition for which we generate code, we have
>      to destroy the loop.  */
> -  if (res && !copy_p)
> +  if (!copy_p)
>     {
>       unsigned nbbs = loop->num_nodes;
>       edge exit = single_exit (loop);
> @@ -531,24 +500,6 @@ has_upstream_mem_writes (int u)
>  static void rdg_flag_vertex_and_dependent (struct graph *, int, bitmap, bitmap,
>                                           bitmap, bool *);
>
> -/* Flag all the uses of U.  */
> -
> -static void
> -rdg_flag_all_uses (struct graph *rdg, int u, bitmap partition, bitmap loops,
> -                  bitmap processed, bool *part_has_writes)
> -{
> -  struct graph_edge *e;
> -
> -  for (e = rdg->vertices[u].succ; e; e = e->succ_next)
> -    if (!bitmap_bit_p (processed, e->dest))
> -      {
> -       rdg_flag_vertex_and_dependent (rdg, e->dest, partition, loops,
> -                                      processed, part_has_writes);
> -       rdg_flag_all_uses (rdg, e->dest, partition, loops, processed,
> -                          part_has_writes);
> -      }
> -}
> -
>  /* Flag the uses of U stopping following the information from
>    upstream_mem_writes.  */
>
> @@ -720,68 +671,13 @@ rdg_flag_loop_exits (struct graph *rdg, bitmap loops, bitmap partition,
>     }
>  }
>
> -/* Flag all the nodes of RDG containing memory accesses that could
> -   potentially belong to arrays already accessed in the current
> -   PARTITION.  */
> -
> -static void
> -rdg_flag_similar_memory_accesses (struct graph *rdg, bitmap partition,
> -                                 bitmap loops, bitmap processed,
> -                                 VEC (int, heap) **other_stores)
> -{
> -  bool foo;
> -  unsigned i, n;
> -  int j, k, kk;
> -  bitmap_iterator ii;
> -  struct graph_edge *e;
> -
> -  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, ii)
> -    if (RDG_MEM_WRITE_STMT (rdg, i)
> -       || RDG_MEM_READS_STMT (rdg, i))
> -      {
> -       for (j = 0; j < rdg->n_vertices; j++)
> -         if (!bitmap_bit_p (processed, j)
> -             && (RDG_MEM_WRITE_STMT (rdg, j)
> -                 || RDG_MEM_READS_STMT (rdg, j))
> -             && rdg_has_similar_memory_accesses (rdg, i, j))
> -           {
> -             /* Flag first the node J itself, and all the nodes that
> -                are needed to compute J.  */
> -             rdg_flag_vertex_and_dependent (rdg, j, partition, loops,
> -                                            processed, &foo);
> -
> -             /* When J is a read, we want to coalesce in the same
> -                PARTITION all the nodes that are using J: this is
> -                needed for better cache locality.  */
> -             rdg_flag_all_uses (rdg, j, partition, loops, processed, &foo);
> -
> -             /* Remove from OTHER_STORES the vertex that we flagged.  */
> -             if (RDG_MEM_WRITE_STMT (rdg, j))
> -               for (k = 0; VEC_iterate (int, *other_stores, k, kk); k++)
> -                 if (kk == j)
> -                   {
> -                     VEC_unordered_remove (int, *other_stores, k);
> -                     break;
> -                   }
> -           }
> -
> -       /* If the node I has two uses, then keep these together in the
> -          same PARTITION.  */
> -       for (n = 0, e = rdg->vertices[i].succ; e; e = e->succ_next, n++);
> -
> -       if (n > 1)
> -         rdg_flag_all_uses (rdg, i, partition, loops, processed, &foo);
> -      }
> -}
> -
>  /* Returns a bitmap in which all the statements needed for computing
>    the strongly connected component C of the RDG are flagged, also
>    including the loop exit conditions.  */
>
>  static bitmap
>  build_rdg_partition_for_component (struct graph *rdg, rdgc c,
> -                                  bool *part_has_writes,
> -                                  VEC (int, heap) **other_stores)
> +                                  bool *part_has_writes)
>  {
>   int i, v;
>   bitmap partition = BITMAP_ALLOC (NULL);
> @@ -793,13 +689,6 @@ build_rdg_partition_for_component (struct graph *rdg, rdgc c,
>       rdg_flag_vertex_and_dependent (rdg, v, partition, loops, processed,
>                                     part_has_writes);
>
> -  /* Also iterate on the array of stores not in the starting vertices,
> -     and determine those vertices that have some memory affinity with
> -     the current nodes in the component: these are stores to the same
> -     arrays, i.e. we're taking care of cache locality.  */
> -  rdg_flag_similar_memory_accesses (rdg, partition, loops, processed,
> -                                   other_stores);
> -
>   rdg_flag_loop_exits (rdg, loops, partition, processed, part_has_writes);
>
>   BITMAP_FREE (processed);
> @@ -863,6 +752,79 @@ rdg_build_components (struct graph *rdg, VEC (int, heap) *starting_vertices,
>   BITMAP_FREE (saved_components);
>  }
>
> +/* Returns true when it is possible to generate a builtin pattern for
> +   the PARTITION of RDG.  For the moment we detect only the memset
> +   zero pattern.  */
> +
> +static bool
> +can_generate_builtin (struct graph *rdg, bitmap partition)
> +{
> +  unsigned i;
> +  bitmap_iterator bi;
> +  int nb_reads = 0;
> +  int nb_writes = 0;
> +  int stores_zero = 0;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, bi)
> +    if (RDG_MEM_READS_STMT (rdg, i))
> +      nb_reads++;
> +    else if (RDG_MEM_WRITE_STMT (rdg, i))
> +      {
> +       nb_writes++;
> +       if (stmt_with_adjacent_zero_store_dr_p (RDG_STMT (rdg, i)))
> +         stores_zero++;
> +      }
> +
> +  return stores_zero == 1 && nb_writes == 1 && nb_reads == 0;
> +}
> +
> +/* Returns true when PARTITION1 and PARTITION2 have similar memory
> +   accesses in RDG.  */
> +
> +static bool
> +similar_memory_accesses (struct graph *rdg, bitmap partition1,
> +                        bitmap partition2)
> +{
> +  unsigned i, j;
> +  bitmap_iterator bi, bj;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (partition1, 0, i, bi)
> +    if (RDG_MEM_WRITE_STMT (rdg, i)
> +       || RDG_MEM_READS_STMT (rdg, i))
> +      EXECUTE_IF_SET_IN_BITMAP (partition2, 0, j, bj)
> +       if (RDG_MEM_WRITE_STMT (rdg, j)
> +           || RDG_MEM_READS_STMT (rdg, j))
> +         if (rdg_has_similar_memory_accesses (rdg, i, j))
> +           return true;
> +
> +  return false;
> +}
> +
> +/* Fuse all the partitions from PARTITIONS that contain similar memory
> +   references, i.e., we're taking care of cache locality.  This
> +   function does not fuse those partitions that contain patterns that
> +   can be code generated with builtins.  */
> +
> +static void
> +fuse_partitions_with_similar_memory_accesses (struct graph *rdg,
> +                                             VEC (bitmap, heap) **partitions)
> +{
> +  int p1, p2;
> +  bitmap partition1, partition2;
> +
> +  for (p1 = 0; VEC_iterate (bitmap, *partitions, p1, partition1); p1++)
> +    if (!can_generate_builtin (rdg, partition1))
> +      for (p2 = 0; VEC_iterate (bitmap, *partitions, p2, partition2); p2++)
> +       if (p1 != p2
> +           && !can_generate_builtin (rdg, partition2)
> +           && similar_memory_accesses (rdg, partition1, partition2))
> +         {
> +           bitmap_ior_into (partition1, partition2);
> +           VEC_ordered_remove (bitmap, *partitions, p2);
> +           p2--;
> +         }
> +}
> +
>  /* Aggregate several components into a useful partition that is
>    registered in the PARTITIONS vector.  Partitions will be
>    distributed in different loops.  */
> @@ -885,8 +847,7 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
>       if (bitmap_bit_p (processed, v))
>        continue;
>
> -      np = build_rdg_partition_for_component (rdg, x, &part_has_writes,
> -                                             other_stores);
> +      np = build_rdg_partition_for_component (rdg, x, &part_has_writes);
>       bitmap_ior_into (partition, np);
>       bitmap_ior_into (processed, np);
>       BITMAP_FREE (np);
> @@ -932,6 +893,8 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
>     VEC_safe_push (bitmap, heap, *partitions, partition);
>   else
>     BITMAP_FREE (partition);
> +
> +  fuse_partitions_with_similar_memory_accesses (rdg, partitions);
>  }
>
>  /* Dump to FILE the PARTITIONS.  */
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses.
  2010-12-13 21:06       ` Sebastian Pop
@ 2010-12-16 21:56         ` Sebastian Pop
  2010-12-18 20:50           ` Richard Guenther
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Pop @ 2010-12-16 21:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

On Mon, Dec 13, 2010 at 13:37, Sebastian Pop <sebpop@gmail.com> wrote:
>> Hi,
>>
>> here is the backport of the same patch for the gcc4.5 branch.  Please
>> let me know when you want to commit this patch to the branch.  For now
>> I sent this out for test on amd64-linux.
>
> This patch passed regression test.
>

Ping.  Ok for the 4.5 branch now that it is open?

Thanks.

> Sebastian
>
>>
>> Sebastian
>>
>> 2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
>>
>>        PR tree-optimization/43023
>>        * tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
>>        Removed.
>>        (stores_zero_from_loop): Call stmt_stores_zero.
>>        (stmt_with_adjacent_zero_store_dr_p): New.
>>        * tree-data-ref.h (stmt_with_adjacent_zero_store_dr_p): Declared.
>>        (stride_of_unit_type_p): New.
>>        * tree-loop-distribution.c (generate_memset_zero): Do not return a
>>        boolean.  Call gcc_assert on stride_of_unit_type_p.
>>        (generate_builtin): Call stmt_stores_zero.
>>        (rdg_flag_all_uses): Removed.
>>        (rdg_flag_similar_memory_accesses): Removed.
>>        (build_rdg_partition_for_component): Removed parameter
>>        other_stores.  Removed call to rdg_flag_similar_memory_accesses.
>>        (can_generate_builtin): New.
>>        (similar_memory_accesses): New.
>>        (fuse_partitions_with_similar_memory_accesses): New.
>>        (rdg_build_partitions): Call
>>        fuse_partitions_with_similar_memory_accesses.
>>
>>        * gfortran.dg/ldist-1.f90: Adjust pattern.
>>        * gfortran.dg/ldist-pr43023.f90: New.
>> ---
>>  gcc/ChangeLog                               |   22 +++
>>  gcc/testsuite/ChangeLog                     |    6 +
>>  gcc/testsuite/gfortran.dg/ldist-1.f90       |    5 +-
>>  gcc/testsuite/gfortran.dg/ldist-pr43023.f90 |   31 ++++
>>  gcc/tree-data-ref.c                         |   34 ++++-
>>  gcc/tree-data-ref.h                         |   12 ++
>>  gcc/tree-loop-distribution.c                |  223 +++++++++++----------------
>>  7 files changed, 201 insertions(+), 132 deletions(-)
>>  create mode 100644 gcc/testsuite/gfortran.dg/ldist-pr43023.f90
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index abecb83..dfe45ed 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,25 @@
>> +2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
>> +
>> +       PR tree-optimization/43023
>> +       * tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
>> +       Removed.
>> +       (stores_zero_from_loop): Call stmt_stores_zero.
>> +       (stmt_with_adjacent_zero_store_dr_p): New.
>> +       * tree-data-ref.h (stmt_with_adjacent_zero_store_dr_p): Declared.
>> +       (stride_of_unit_type_p): New.
>> +       * tree-loop-distribution.c (generate_memset_zero): Do not return a
>> +       boolean.  Call gcc_assert on stride_of_unit_type_p.
>> +       (generate_builtin): Call stmt_stores_zero.
>> +       (rdg_flag_all_uses): Removed.
>> +       (rdg_flag_similar_memory_accesses): Removed.
>> +       (build_rdg_partition_for_component): Removed parameter
>> +       other_stores.  Removed call to rdg_flag_similar_memory_accesses.
>> +       (can_generate_builtin): New.
>> +       (similar_memory_accesses): New.
>> +       (fuse_partitions_with_similar_memory_accesses): New.
>> +       (rdg_build_partitions): Call
>> +       fuse_partitions_with_similar_memory_accesses.
>> +
>>  2010-12-07  Jakub Jelinek  <jakub@redhat.com>
>>
>>        Backport from mainline
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index e3a9d24..36fd59d 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,9 @@
>> +2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
>> +
>> +       PR tree-optimization/43023
>> +       * gfortran.dg/ldist-1.f90: Adjust pattern.
>> +       * gfortran.dg/ldist-pr43023.f90: New.
>> +
>>  2010-12-07  Sebastian Pop  <sebastian.pop@amd.com>
>>
>>        PR tree-optimization/44676
>> diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90
>> index dd1f02a..bbce2f3 100644
>> --- a/gcc/testsuite/gfortran.dg/ldist-1.f90
>> +++ b/gcc/testsuite/gfortran.dg/ldist-1.f90
>> @@ -29,5 +29,8 @@ Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
>>   return
>>  end Subroutine PADEC
>>
>> -! { dg-final { scan-tree-dump-times "distributed: split to 4 loops" 1 "ldist" } }
>> +! There are 5 legal partitions in this code.  Based on the data
>> +! locality heuristic, this loop should not be split.
>> +
>> +! { dg-final { scan-tree-dump-not "distributed: split to" "ldist" } }
>>  ! { dg-final { cleanup-tree-dump "ldist" } }
>> diff --git a/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
>> new file mode 100644
>> index 0000000..3e2d04c
>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
>> @@ -0,0 +1,31 @@
>> +! { dg-do compile }
>> +! { dg-options "-O2 -ftree-loop-distribution" }
>> +
>> +MODULE NFT_mod
>> +
>> +implicit none
>> +integer :: Nangle
>> +real:: Z0
>> +real, dimension(:,:), allocatable :: Angle
>> +real, dimension(:), allocatable :: exth, ezth, hxth, hyth, hyphi
>> +
>> +CONTAINS
>> +
>> +SUBROUTINE NFT_Init()
>> +
>> +real :: th, fi
>> +integer :: n
>> +
>> +do n = 1,Nangle
>> +  th = Angle(n,1)
>> +  fi = Angle(n,2)
>> +
>> +  exth(n) =  cos(fi)*cos(th)
>> +  ezth(n) = -sin(th)
>> +  hxth(n) = -sin(fi)
>> +  hyth(n) =  cos(fi)
>> +  hyphi(n) = -sin(fi)
>> +end do
>> +END SUBROUTINE NFT_Init
>> +
>> +END MODULE NFT_mod
>> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
>> index a89d151..dab0177 100644
>> --- a/gcc/tree-data-ref.c
>> +++ b/gcc/tree-data-ref.c
>> @@ -4594,7 +4594,7 @@ dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
>>     for (e = v->succ; e; e = e->succ_next)
>>       fprintf (file, " %d", e->dest);
>>
>> -  fprintf (file, ") \n");
>> +  fprintf (file, ")\n");
>>   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
>>   fprintf (file, ")\n");
>>  }
>> @@ -4991,6 +4991,38 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
>>   free (bbs);
>>  }
>>
>> +/* Returns true when the statement at STMT is of the form "A[i] = 0"
>> +   that contains a data reference on its LHS with a stride of the same
>> +   size as its unit type.  */
>> +
>> +bool
>> +stmt_with_adjacent_zero_store_dr_p (gimple stmt)
>> +{
>> +  tree op0, op1;
>> +  bool res;
>> +  struct data_reference *dr;
>> +
>> +  if (!stmt
>> +      || !gimple_vdef (stmt)
>> +      || !is_gimple_assign (stmt)
>> +      || !gimple_assign_single_p (stmt)
>> +      || !(op1 = gimple_assign_rhs1 (stmt))
>> +      || !(integer_zerop (op1) || real_zerop (op1)))
>> +    return false;
>> +
>> +  dr = XCNEW (struct data_reference);
>> +  op0 = gimple_assign_lhs (stmt);
>> +
>> +  DR_STMT (dr) = stmt;
>> +  DR_REF (dr) = op0;
>> +
>> +  res = dr_analyze_innermost (dr)
>> +    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
>> +
>> +  free_data_ref (dr);
>> +  return res;
>> +}
>> +
>>  /* For a data reference REF, return the declaration of its base
>>    address or NULL_TREE if the base is not determined.  */
>>
>> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
>> index 678eb10..90cbca1 100644
>> --- a/gcc/tree-data-ref.h
>> +++ b/gcc/tree-data-ref.h
>> @@ -567,6 +567,18 @@ void stores_from_loop (struct loop *, VEC (gimple, heap) **);
>>  void remove_similar_memory_refs (VEC (gimple, heap) **);
>>  bool rdg_defs_used_in_other_loops_p (struct graph *, int);
>>  bool have_similar_memory_accesses (gimple, gimple);
>> +bool stmt_with_adjacent_zero_store_dr_p (gimple);
>> +
>> +/* Returns true when STRIDE is equal in absolute value to the size of
>> +   the unit type of TYPE.  */
>> +
>> +static inline bool
>> +stride_of_unit_type_p (tree stride, tree type)
>> +{
>> +  return tree_int_cst_equal (fold_unary (ABS_EXPR, TREE_TYPE (stride),
>> +                                        stride),
>> +                            TYPE_SIZE_UNIT (type));
>> +}
>>
>>  /* Determines whether RDG vertices V1 and V2 access to similar memory
>>    locations, in which case they have to be in the same partition.  */
>> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
>> index a328860..2e420ea 100644
>> --- a/gcc/tree-loop-distribution.c
>> +++ b/gcc/tree-loop-distribution.c
>> @@ -251,7 +251,7 @@ build_size_arg_loc (location_t loc, tree nb_iter, tree op,
>>
>>  /* Generate a call to memset.  Return true when the operation succeeded.  */
>>
>> -static bool
>> +static void
>>  generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>>                      gimple_stmt_iterator bsi)
>>  {
>> @@ -265,45 +265,27 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>>
>>   DR_STMT (dr) = stmt;
>>   DR_REF (dr) = op0;
>> -  if (!dr_analyze_innermost (dr))
>> -    goto end;
>> +  res = dr_analyze_innermost (dr);
>> +  gcc_assert (res && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)));
>>
>> -  /* Test for a positive stride, iterating over every element.  */
>> -  if (integer_zerop (size_binop (MINUS_EXPR,
>> -                                fold_convert (sizetype, DR_STEP (dr)),
>> -                                TYPE_SIZE_UNIT (TREE_TYPE (op0)))))
>> -    {
>> -      addr_base = fold_convert_loc (loc, sizetype,
>> -                                   size_binop_loc (loc, PLUS_EXPR,
>> -                                                   DR_OFFSET (dr),
>> -                                                   DR_INIT (dr)));
>> -      addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
>> -                                  TREE_TYPE (DR_BASE_ADDRESS (dr)),
>> -                                  DR_BASE_ADDRESS (dr), addr_base);
>> -
>> -      nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
>> -    }
>> +  nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
>> +  addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
>> +  addr_base = fold_convert_loc (loc, sizetype, addr_base);
>>
>>   /* Test for a negative stride, iterating over every element.  */
>> -  else if (integer_zerop (size_binop (PLUS_EXPR,
>> -                                     TYPE_SIZE_UNIT (TREE_TYPE (op0)),
>> -                                     fold_convert (sizetype, DR_STEP (dr)))))
>> +  if (integer_zerop (size_binop (PLUS_EXPR,
>> +                                TYPE_SIZE_UNIT (TREE_TYPE (op0)),
>> +                                fold_convert (sizetype, DR_STEP (dr)))))
>>     {
>> -      nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
>> -
>> -      addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
>> -      addr_base = fold_convert_loc (loc, sizetype, addr_base);
>>       addr_base = size_binop_loc (loc, MINUS_EXPR, addr_base,
>>                                  fold_convert_loc (loc, sizetype, nb_bytes));
>>       addr_base = size_binop_loc (loc, PLUS_EXPR, addr_base,
>>                                  TYPE_SIZE_UNIT (TREE_TYPE (op0)));
>> -      addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
>> -                                  TREE_TYPE (DR_BASE_ADDRESS (dr)),
>> -                                  DR_BASE_ADDRESS (dr), addr_base);
>>     }
>> -  else
>> -    goto end;
>>
>> +  addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
>> +                              TREE_TYPE (DR_BASE_ADDRESS (dr)),
>> +                              DR_BASE_ADDRESS (dr), addr_base);
>>   mem = force_gimple_operand (addr_base, &stmts, true, NULL);
>>   gimple_seq_add_seq (&stmt_list, stmts);
>>
>> @@ -311,14 +293,11 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>>   fn_call = gimple_build_call (fn, 3, mem, integer_zero_node, nb_bytes);
>>   gimple_seq_add_stmt (&stmt_list, fn_call);
>>   gsi_insert_seq_after (&bsi, stmt_list, GSI_CONTINUE_LINKING);
>> -  res = true;
>>
>>   if (dump_file && (dump_flags & TDF_DETAILS))
>>     fprintf (dump_file, "generated memset zero\n");
>>
>> - end:
>>   free_data_ref (dr);
>> -  return res;
>>  }
>>
>>  /* Tries to generate a builtin function for the instructions of LOOP
>> @@ -332,7 +311,6 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
>>   unsigned i, x = 0;
>>   basic_block *bbs;
>>   gimple write = NULL;
>> -  tree op0, op1;
>>   gimple_stmt_iterator bsi;
>>   tree nb_iter = number_of_exit_cond_executions (loop);
>>
>> @@ -368,26 +346,17 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
>>        }
>>     }
>>
>> -  if (!write)
>> -    goto end;
>> -
>> -  op0 = gimple_assign_lhs (write);
>> -  op1 = gimple_assign_rhs1 (write);
>> -
>> -  if (!(TREE_CODE (op0) == ARRAY_REF
>> -       || TREE_CODE (op0) == INDIRECT_REF))
>> +  if (!stmt_with_adjacent_zero_store_dr_p (write))
>>     goto end;
>>
>>   /* The new statements will be placed before LOOP.  */
>>   bsi = gsi_last_bb (loop_preheader_edge (loop)->src);
>> -
>> -  if (gimple_assign_rhs_code (write) == INTEGER_CST
>> -      && (integer_zerop (op1) || real_zerop (op1)))
>> -    res = generate_memset_zero (write, op0, nb_iter, bsi);
>> +  generate_memset_zero (write, gimple_assign_lhs (write), nb_iter, bsi);
>> +  res = true;
>>
>>   /* If this is the last partition for which we generate code, we have
>>      to destroy the loop.  */
>> -  if (res && !copy_p)
>> +  if (!copy_p)
>>     {
>>       unsigned nbbs = loop->num_nodes;
>>       edge exit = single_exit (loop);
>> @@ -531,24 +500,6 @@ has_upstream_mem_writes (int u)
>>  static void rdg_flag_vertex_and_dependent (struct graph *, int, bitmap, bitmap,
>>                                           bitmap, bool *);
>>
>> -/* Flag all the uses of U.  */
>> -
>> -static void
>> -rdg_flag_all_uses (struct graph *rdg, int u, bitmap partition, bitmap loops,
>> -                  bitmap processed, bool *part_has_writes)
>> -{
>> -  struct graph_edge *e;
>> -
>> -  for (e = rdg->vertices[u].succ; e; e = e->succ_next)
>> -    if (!bitmap_bit_p (processed, e->dest))
>> -      {
>> -       rdg_flag_vertex_and_dependent (rdg, e->dest, partition, loops,
>> -                                      processed, part_has_writes);
>> -       rdg_flag_all_uses (rdg, e->dest, partition, loops, processed,
>> -                          part_has_writes);
>> -      }
>> -}
>> -
>>  /* Flag the uses of U stopping following the information from
>>    upstream_mem_writes.  */
>>
>> @@ -720,68 +671,13 @@ rdg_flag_loop_exits (struct graph *rdg, bitmap loops, bitmap partition,
>>     }
>>  }
>>
>> -/* Flag all the nodes of RDG containing memory accesses that could
>> -   potentially belong to arrays already accessed in the current
>> -   PARTITION.  */
>> -
>> -static void
>> -rdg_flag_similar_memory_accesses (struct graph *rdg, bitmap partition,
>> -                                 bitmap loops, bitmap processed,
>> -                                 VEC (int, heap) **other_stores)
>> -{
>> -  bool foo;
>> -  unsigned i, n;
>> -  int j, k, kk;
>> -  bitmap_iterator ii;
>> -  struct graph_edge *e;
>> -
>> -  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, ii)
>> -    if (RDG_MEM_WRITE_STMT (rdg, i)
>> -       || RDG_MEM_READS_STMT (rdg, i))
>> -      {
>> -       for (j = 0; j < rdg->n_vertices; j++)
>> -         if (!bitmap_bit_p (processed, j)
>> -             && (RDG_MEM_WRITE_STMT (rdg, j)
>> -                 || RDG_MEM_READS_STMT (rdg, j))
>> -             && rdg_has_similar_memory_accesses (rdg, i, j))
>> -           {
>> -             /* Flag first the node J itself, and all the nodes that
>> -                are needed to compute J.  */
>> -             rdg_flag_vertex_and_dependent (rdg, j, partition, loops,
>> -                                            processed, &foo);
>> -
>> -             /* When J is a read, we want to coalesce in the same
>> -                PARTITION all the nodes that are using J: this is
>> -                needed for better cache locality.  */
>> -             rdg_flag_all_uses (rdg, j, partition, loops, processed, &foo);
>> -
>> -             /* Remove from OTHER_STORES the vertex that we flagged.  */
>> -             if (RDG_MEM_WRITE_STMT (rdg, j))
>> -               for (k = 0; VEC_iterate (int, *other_stores, k, kk); k++)
>> -                 if (kk == j)
>> -                   {
>> -                     VEC_unordered_remove (int, *other_stores, k);
>> -                     break;
>> -                   }
>> -           }
>> -
>> -       /* If the node I has two uses, then keep these together in the
>> -          same PARTITION.  */
>> -       for (n = 0, e = rdg->vertices[i].succ; e; e = e->succ_next, n++);
>> -
>> -       if (n > 1)
>> -         rdg_flag_all_uses (rdg, i, partition, loops, processed, &foo);
>> -      }
>> -}
>> -
>>  /* Returns a bitmap in which all the statements needed for computing
>>    the strongly connected component C of the RDG are flagged, also
>>    including the loop exit conditions.  */
>>
>>  static bitmap
>>  build_rdg_partition_for_component (struct graph *rdg, rdgc c,
>> -                                  bool *part_has_writes,
>> -                                  VEC (int, heap) **other_stores)
>> +                                  bool *part_has_writes)
>>  {
>>   int i, v;
>>   bitmap partition = BITMAP_ALLOC (NULL);
>> @@ -793,13 +689,6 @@ build_rdg_partition_for_component (struct graph *rdg, rdgc c,
>>       rdg_flag_vertex_and_dependent (rdg, v, partition, loops, processed,
>>                                     part_has_writes);
>>
>> -  /* Also iterate on the array of stores not in the starting vertices,
>> -     and determine those vertices that have some memory affinity with
>> -     the current nodes in the component: these are stores to the same
>> -     arrays, i.e. we're taking care of cache locality.  */
>> -  rdg_flag_similar_memory_accesses (rdg, partition, loops, processed,
>> -                                   other_stores);
>> -
>>   rdg_flag_loop_exits (rdg, loops, partition, processed, part_has_writes);
>>
>>   BITMAP_FREE (processed);
>> @@ -863,6 +752,79 @@ rdg_build_components (struct graph *rdg, VEC (int, heap) *starting_vertices,
>>   BITMAP_FREE (saved_components);
>>  }
>>
>> +/* Returns true when it is possible to generate a builtin pattern for
>> +   the PARTITION of RDG.  For the moment we detect only the memset
>> +   zero pattern.  */
>> +
>> +static bool
>> +can_generate_builtin (struct graph *rdg, bitmap partition)
>> +{
>> +  unsigned i;
>> +  bitmap_iterator bi;
>> +  int nb_reads = 0;
>> +  int nb_writes = 0;
>> +  int stores_zero = 0;
>> +
>> +  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, bi)
>> +    if (RDG_MEM_READS_STMT (rdg, i))
>> +      nb_reads++;
>> +    else if (RDG_MEM_WRITE_STMT (rdg, i))
>> +      {
>> +       nb_writes++;
>> +       if (stmt_with_adjacent_zero_store_dr_p (RDG_STMT (rdg, i)))
>> +         stores_zero++;
>> +      }
>> +
>> +  return stores_zero == 1 && nb_writes == 1 && nb_reads == 0;
>> +}
>> +
>> +/* Returns true when PARTITION1 and PARTITION2 have similar memory
>> +   accesses in RDG.  */
>> +
>> +static bool
>> +similar_memory_accesses (struct graph *rdg, bitmap partition1,
>> +                        bitmap partition2)
>> +{
>> +  unsigned i, j;
>> +  bitmap_iterator bi, bj;
>> +
>> +  EXECUTE_IF_SET_IN_BITMAP (partition1, 0, i, bi)
>> +    if (RDG_MEM_WRITE_STMT (rdg, i)
>> +       || RDG_MEM_READS_STMT (rdg, i))
>> +      EXECUTE_IF_SET_IN_BITMAP (partition2, 0, j, bj)
>> +       if (RDG_MEM_WRITE_STMT (rdg, j)
>> +           || RDG_MEM_READS_STMT (rdg, j))
>> +         if (rdg_has_similar_memory_accesses (rdg, i, j))
>> +           return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Fuse all the partitions from PARTITIONS that contain similar memory
>> +   references, i.e., we're taking care of cache locality.  This
>> +   function does not fuse those partitions that contain patterns that
>> +   can be code generated with builtins.  */
>> +
>> +static void
>> +fuse_partitions_with_similar_memory_accesses (struct graph *rdg,
>> +                                             VEC (bitmap, heap) **partitions)
>> +{
>> +  int p1, p2;
>> +  bitmap partition1, partition2;
>> +
>> +  for (p1 = 0; VEC_iterate (bitmap, *partitions, p1, partition1); p1++)
>> +    if (!can_generate_builtin (rdg, partition1))
>> +      for (p2 = 0; VEC_iterate (bitmap, *partitions, p2, partition2); p2++)
>> +       if (p1 != p2
>> +           && !can_generate_builtin (rdg, partition2)
>> +           && similar_memory_accesses (rdg, partition1, partition2))
>> +         {
>> +           bitmap_ior_into (partition1, partition2);
>> +           VEC_ordered_remove (bitmap, *partitions, p2);
>> +           p2--;
>> +         }
>> +}
>> +
>>  /* Aggregate several components into a useful partition that is
>>    registered in the PARTITIONS vector.  Partitions will be
>>    distributed in different loops.  */
>> @@ -885,8 +847,7 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
>>       if (bitmap_bit_p (processed, v))
>>        continue;
>>
>> -      np = build_rdg_partition_for_component (rdg, x, &part_has_writes,
>> -                                             other_stores);
>> +      np = build_rdg_partition_for_component (rdg, x, &part_has_writes);
>>       bitmap_ior_into (partition, np);
>>       bitmap_ior_into (processed, np);
>>       BITMAP_FREE (np);
>> @@ -932,6 +893,8 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
>>     VEC_safe_push (bitmap, heap, *partitions, partition);
>>   else
>>     BITMAP_FREE (partition);
>> +
>> +  fuse_partitions_with_similar_memory_accesses (rdg, partitions);
>>  }
>>
>>  /* Dump to FILE the PARTITIONS.  */
>> --
>> 1.7.1
>>
>>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses.
  2010-12-16 21:56         ` Sebastian Pop
@ 2010-12-18 20:50           ` Richard Guenther
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Guenther @ 2010-12-18 20:50 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, rguenther

On Thu, Dec 16, 2010 at 9:37 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Mon, Dec 13, 2010 at 13:37, Sebastian Pop <sebpop@gmail.com> wrote:
>>> Hi,
>>>
>>> here is the backport of the same patch for the gcc4.5 branch.  Please
>>> let me know when you want to commit this patch to the branch.  For now
>>> I sent this out for test on amd64-linux.
>>
>> This patch passed regression test.
>>
>
> Ping.  Ok for the 4.5 branch now that it is open?

The patch is a little big large, so please wait another week to see
if HJ blames it for some new regression.  Ok after that.

Thanks,
Richard.

> Thanks.
>
>> Sebastian
>>
>>>
>>> Sebastian
>>>
>>> 2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
>>>
>>>        PR tree-optimization/43023
>>>        * tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
>>>        Removed.
>>>        (stores_zero_from_loop): Call stmt_stores_zero.
>>>        (stmt_with_adjacent_zero_store_dr_p): New.
>>>        * tree-data-ref.h (stmt_with_adjacent_zero_store_dr_p): Declared.
>>>        (stride_of_unit_type_p): New.
>>>        * tree-loop-distribution.c (generate_memset_zero): Do not return a
>>>        boolean.  Call gcc_assert on stride_of_unit_type_p.
>>>        (generate_builtin): Call stmt_stores_zero.
>>>        (rdg_flag_all_uses): Removed.
>>>        (rdg_flag_similar_memory_accesses): Removed.
>>>        (build_rdg_partition_for_component): Removed parameter
>>>        other_stores.  Removed call to rdg_flag_similar_memory_accesses.
>>>        (can_generate_builtin): New.
>>>        (similar_memory_accesses): New.
>>>        (fuse_partitions_with_similar_memory_accesses): New.
>>>        (rdg_build_partitions): Call
>>>        fuse_partitions_with_similar_memory_accesses.
>>>
>>>        * gfortran.dg/ldist-1.f90: Adjust pattern.
>>>        * gfortran.dg/ldist-pr43023.f90: New.
>>> ---
>>>  gcc/ChangeLog                               |   22 +++
>>>  gcc/testsuite/ChangeLog                     |    6 +
>>>  gcc/testsuite/gfortran.dg/ldist-1.f90       |    5 +-
>>>  gcc/testsuite/gfortran.dg/ldist-pr43023.f90 |   31 ++++
>>>  gcc/tree-data-ref.c                         |   34 ++++-
>>>  gcc/tree-data-ref.h                         |   12 ++
>>>  gcc/tree-loop-distribution.c                |  223 +++++++++++----------------
>>>  7 files changed, 201 insertions(+), 132 deletions(-)
>>>  create mode 100644 gcc/testsuite/gfortran.dg/ldist-pr43023.f90
>>>
>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>> index abecb83..dfe45ed 100644
>>> --- a/gcc/ChangeLog
>>> +++ b/gcc/ChangeLog
>>> @@ -1,3 +1,25 @@
>>> +2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
>>> +
>>> +       PR tree-optimization/43023
>>> +       * tree-data-ref.c (mem_write_stride_of_same_size_as_unit_type_p):
>>> +       Removed.
>>> +       (stores_zero_from_loop): Call stmt_stores_zero.
>>> +       (stmt_with_adjacent_zero_store_dr_p): New.
>>> +       * tree-data-ref.h (stmt_with_adjacent_zero_store_dr_p): Declared.
>>> +       (stride_of_unit_type_p): New.
>>> +       * tree-loop-distribution.c (generate_memset_zero): Do not return a
>>> +       boolean.  Call gcc_assert on stride_of_unit_type_p.
>>> +       (generate_builtin): Call stmt_stores_zero.
>>> +       (rdg_flag_all_uses): Removed.
>>> +       (rdg_flag_similar_memory_accesses): Removed.
>>> +       (build_rdg_partition_for_component): Removed parameter
>>> +       other_stores.  Removed call to rdg_flag_similar_memory_accesses.
>>> +       (can_generate_builtin): New.
>>> +       (similar_memory_accesses): New.
>>> +       (fuse_partitions_with_similar_memory_accesses): New.
>>> +       (rdg_build_partitions): Call
>>> +       fuse_partitions_with_similar_memory_accesses.
>>> +
>>>  2010-12-07  Jakub Jelinek  <jakub@redhat.com>
>>>
>>>        Backport from mainline
>>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>>> index e3a9d24..36fd59d 100644
>>> --- a/gcc/testsuite/ChangeLog
>>> +++ b/gcc/testsuite/ChangeLog
>>> @@ -1,3 +1,9 @@
>>> +2010-12-10  Sebastian Pop  <sebastian.pop@amd.com>
>>> +
>>> +       PR tree-optimization/43023
>>> +       * gfortran.dg/ldist-1.f90: Adjust pattern.
>>> +       * gfortran.dg/ldist-pr43023.f90: New.
>>> +
>>>  2010-12-07  Sebastian Pop  <sebastian.pop@amd.com>
>>>
>>>        PR tree-optimization/44676
>>> diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90
>>> index dd1f02a..bbce2f3 100644
>>> --- a/gcc/testsuite/gfortran.dg/ldist-1.f90
>>> +++ b/gcc/testsuite/gfortran.dg/ldist-1.f90
>>> @@ -29,5 +29,8 @@ Subroutine PADEC(DKS,DKDS,HVAR,WM,WG,FN,NS,AN,BN,CN,IT)
>>>   return
>>>  end Subroutine PADEC
>>>
>>> -! { dg-final { scan-tree-dump-times "distributed: split to 4 loops" 1 "ldist" } }
>>> +! There are 5 legal partitions in this code.  Based on the data
>>> +! locality heuristic, this loop should not be split.
>>> +
>>> +! { dg-final { scan-tree-dump-not "distributed: split to" "ldist" } }
>>>  ! { dg-final { cleanup-tree-dump "ldist" } }
>>> diff --git a/gcc/testsuite/gfortran.dg/ldist-pr43023.f90 b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
>>> new file mode 100644
>>> index 0000000..3e2d04c
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gfortran.dg/ldist-pr43023.f90
>>> @@ -0,0 +1,31 @@
>>> +! { dg-do compile }
>>> +! { dg-options "-O2 -ftree-loop-distribution" }
>>> +
>>> +MODULE NFT_mod
>>> +
>>> +implicit none
>>> +integer :: Nangle
>>> +real:: Z0
>>> +real, dimension(:,:), allocatable :: Angle
>>> +real, dimension(:), allocatable :: exth, ezth, hxth, hyth, hyphi
>>> +
>>> +CONTAINS
>>> +
>>> +SUBROUTINE NFT_Init()
>>> +
>>> +real :: th, fi
>>> +integer :: n
>>> +
>>> +do n = 1,Nangle
>>> +  th = Angle(n,1)
>>> +  fi = Angle(n,2)
>>> +
>>> +  exth(n) =  cos(fi)*cos(th)
>>> +  ezth(n) = -sin(th)
>>> +  hxth(n) = -sin(fi)
>>> +  hyth(n) =  cos(fi)
>>> +  hyphi(n) = -sin(fi)
>>> +end do
>>> +END SUBROUTINE NFT_Init
>>> +
>>> +END MODULE NFT_mod
>>> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
>>> index a89d151..dab0177 100644
>>> --- a/gcc/tree-data-ref.c
>>> +++ b/gcc/tree-data-ref.c
>>> @@ -4594,7 +4594,7 @@ dump_rdg_vertex (FILE *file, struct graph *rdg, int i)
>>>     for (e = v->succ; e; e = e->succ_next)
>>>       fprintf (file, " %d", e->dest);
>>>
>>> -  fprintf (file, ") \n");
>>> +  fprintf (file, ")\n");
>>>   print_gimple_stmt (file, RDGV_STMT (v), 0, TDF_VOPS|TDF_MEMSYMS);
>>>   fprintf (file, ")\n");
>>>  }
>>> @@ -4991,6 +4991,38 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
>>>   free (bbs);
>>>  }
>>>
>>> +/* Returns true when the statement at STMT is of the form "A[i] = 0"
>>> +   that contains a data reference on its LHS with a stride of the same
>>> +   size as its unit type.  */
>>> +
>>> +bool
>>> +stmt_with_adjacent_zero_store_dr_p (gimple stmt)
>>> +{
>>> +  tree op0, op1;
>>> +  bool res;
>>> +  struct data_reference *dr;
>>> +
>>> +  if (!stmt
>>> +      || !gimple_vdef (stmt)
>>> +      || !is_gimple_assign (stmt)
>>> +      || !gimple_assign_single_p (stmt)
>>> +      || !(op1 = gimple_assign_rhs1 (stmt))
>>> +      || !(integer_zerop (op1) || real_zerop (op1)))
>>> +    return false;
>>> +
>>> +  dr = XCNEW (struct data_reference);
>>> +  op0 = gimple_assign_lhs (stmt);
>>> +
>>> +  DR_STMT (dr) = stmt;
>>> +  DR_REF (dr) = op0;
>>> +
>>> +  res = dr_analyze_innermost (dr)
>>> +    && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0));
>>> +
>>> +  free_data_ref (dr);
>>> +  return res;
>>> +}
>>> +
>>>  /* For a data reference REF, return the declaration of its base
>>>    address or NULL_TREE if the base is not determined.  */
>>>
>>> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
>>> index 678eb10..90cbca1 100644
>>> --- a/gcc/tree-data-ref.h
>>> +++ b/gcc/tree-data-ref.h
>>> @@ -567,6 +567,18 @@ void stores_from_loop (struct loop *, VEC (gimple, heap) **);
>>>  void remove_similar_memory_refs (VEC (gimple, heap) **);
>>>  bool rdg_defs_used_in_other_loops_p (struct graph *, int);
>>>  bool have_similar_memory_accesses (gimple, gimple);
>>> +bool stmt_with_adjacent_zero_store_dr_p (gimple);
>>> +
>>> +/* Returns true when STRIDE is equal in absolute value to the size of
>>> +   the unit type of TYPE.  */
>>> +
>>> +static inline bool
>>> +stride_of_unit_type_p (tree stride, tree type)
>>> +{
>>> +  return tree_int_cst_equal (fold_unary (ABS_EXPR, TREE_TYPE (stride),
>>> +                                        stride),
>>> +                            TYPE_SIZE_UNIT (type));
>>> +}
>>>
>>>  /* Determines whether RDG vertices V1 and V2 access to similar memory
>>>    locations, in which case they have to be in the same partition.  */
>>> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
>>> index a328860..2e420ea 100644
>>> --- a/gcc/tree-loop-distribution.c
>>> +++ b/gcc/tree-loop-distribution.c
>>> @@ -251,7 +251,7 @@ build_size_arg_loc (location_t loc, tree nb_iter, tree op,
>>>
>>>  /* Generate a call to memset.  Return true when the operation succeeded.  */
>>>
>>> -static bool
>>> +static void
>>>  generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>>>                      gimple_stmt_iterator bsi)
>>>  {
>>> @@ -265,45 +265,27 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>>>
>>>   DR_STMT (dr) = stmt;
>>>   DR_REF (dr) = op0;
>>> -  if (!dr_analyze_innermost (dr))
>>> -    goto end;
>>> +  res = dr_analyze_innermost (dr);
>>> +  gcc_assert (res && stride_of_unit_type_p (DR_STEP (dr), TREE_TYPE (op0)));
>>>
>>> -  /* Test for a positive stride, iterating over every element.  */
>>> -  if (integer_zerop (size_binop (MINUS_EXPR,
>>> -                                fold_convert (sizetype, DR_STEP (dr)),
>>> -                                TYPE_SIZE_UNIT (TREE_TYPE (op0)))))
>>> -    {
>>> -      addr_base = fold_convert_loc (loc, sizetype,
>>> -                                   size_binop_loc (loc, PLUS_EXPR,
>>> -                                                   DR_OFFSET (dr),
>>> -                                                   DR_INIT (dr)));
>>> -      addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
>>> -                                  TREE_TYPE (DR_BASE_ADDRESS (dr)),
>>> -                                  DR_BASE_ADDRESS (dr), addr_base);
>>> -
>>> -      nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
>>> -    }
>>> +  nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
>>> +  addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
>>> +  addr_base = fold_convert_loc (loc, sizetype, addr_base);
>>>
>>>   /* Test for a negative stride, iterating over every element.  */
>>> -  else if (integer_zerop (size_binop (PLUS_EXPR,
>>> -                                     TYPE_SIZE_UNIT (TREE_TYPE (op0)),
>>> -                                     fold_convert (sizetype, DR_STEP (dr)))))
>>> +  if (integer_zerop (size_binop (PLUS_EXPR,
>>> +                                TYPE_SIZE_UNIT (TREE_TYPE (op0)),
>>> +                                fold_convert (sizetype, DR_STEP (dr)))))
>>>     {
>>> -      nb_bytes = build_size_arg_loc (loc, nb_iter, op0, &stmt_list);
>>> -
>>> -      addr_base = size_binop_loc (loc, PLUS_EXPR, DR_OFFSET (dr), DR_INIT (dr));
>>> -      addr_base = fold_convert_loc (loc, sizetype, addr_base);
>>>       addr_base = size_binop_loc (loc, MINUS_EXPR, addr_base,
>>>                                  fold_convert_loc (loc, sizetype, nb_bytes));
>>>       addr_base = size_binop_loc (loc, PLUS_EXPR, addr_base,
>>>                                  TYPE_SIZE_UNIT (TREE_TYPE (op0)));
>>> -      addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
>>> -                                  TREE_TYPE (DR_BASE_ADDRESS (dr)),
>>> -                                  DR_BASE_ADDRESS (dr), addr_base);
>>>     }
>>> -  else
>>> -    goto end;
>>>
>>> +  addr_base = fold_build2_loc (loc, POINTER_PLUS_EXPR,
>>> +                              TREE_TYPE (DR_BASE_ADDRESS (dr)),
>>> +                              DR_BASE_ADDRESS (dr), addr_base);
>>>   mem = force_gimple_operand (addr_base, &stmts, true, NULL);
>>>   gimple_seq_add_seq (&stmt_list, stmts);
>>>
>>> @@ -311,14 +293,11 @@ generate_memset_zero (gimple stmt, tree op0, tree nb_iter,
>>>   fn_call = gimple_build_call (fn, 3, mem, integer_zero_node, nb_bytes);
>>>   gimple_seq_add_stmt (&stmt_list, fn_call);
>>>   gsi_insert_seq_after (&bsi, stmt_list, GSI_CONTINUE_LINKING);
>>> -  res = true;
>>>
>>>   if (dump_file && (dump_flags & TDF_DETAILS))
>>>     fprintf (dump_file, "generated memset zero\n");
>>>
>>> - end:
>>>   free_data_ref (dr);
>>> -  return res;
>>>  }
>>>
>>>  /* Tries to generate a builtin function for the instructions of LOOP
>>> @@ -332,7 +311,6 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
>>>   unsigned i, x = 0;
>>>   basic_block *bbs;
>>>   gimple write = NULL;
>>> -  tree op0, op1;
>>>   gimple_stmt_iterator bsi;
>>>   tree nb_iter = number_of_exit_cond_executions (loop);
>>>
>>> @@ -368,26 +346,17 @@ generate_builtin (struct loop *loop, bitmap partition, bool copy_p)
>>>        }
>>>     }
>>>
>>> -  if (!write)
>>> -    goto end;
>>> -
>>> -  op0 = gimple_assign_lhs (write);
>>> -  op1 = gimple_assign_rhs1 (write);
>>> -
>>> -  if (!(TREE_CODE (op0) == ARRAY_REF
>>> -       || TREE_CODE (op0) == INDIRECT_REF))
>>> +  if (!stmt_with_adjacent_zero_store_dr_p (write))
>>>     goto end;
>>>
>>>   /* The new statements will be placed before LOOP.  */
>>>   bsi = gsi_last_bb (loop_preheader_edge (loop)->src);
>>> -
>>> -  if (gimple_assign_rhs_code (write) == INTEGER_CST
>>> -      && (integer_zerop (op1) || real_zerop (op1)))
>>> -    res = generate_memset_zero (write, op0, nb_iter, bsi);
>>> +  generate_memset_zero (write, gimple_assign_lhs (write), nb_iter, bsi);
>>> +  res = true;
>>>
>>>   /* If this is the last partition for which we generate code, we have
>>>      to destroy the loop.  */
>>> -  if (res && !copy_p)
>>> +  if (!copy_p)
>>>     {
>>>       unsigned nbbs = loop->num_nodes;
>>>       edge exit = single_exit (loop);
>>> @@ -531,24 +500,6 @@ has_upstream_mem_writes (int u)
>>>  static void rdg_flag_vertex_and_dependent (struct graph *, int, bitmap, bitmap,
>>>                                           bitmap, bool *);
>>>
>>> -/* Flag all the uses of U.  */
>>> -
>>> -static void
>>> -rdg_flag_all_uses (struct graph *rdg, int u, bitmap partition, bitmap loops,
>>> -                  bitmap processed, bool *part_has_writes)
>>> -{
>>> -  struct graph_edge *e;
>>> -
>>> -  for (e = rdg->vertices[u].succ; e; e = e->succ_next)
>>> -    if (!bitmap_bit_p (processed, e->dest))
>>> -      {
>>> -       rdg_flag_vertex_and_dependent (rdg, e->dest, partition, loops,
>>> -                                      processed, part_has_writes);
>>> -       rdg_flag_all_uses (rdg, e->dest, partition, loops, processed,
>>> -                          part_has_writes);
>>> -      }
>>> -}
>>> -
>>>  /* Flag the uses of U stopping following the information from
>>>    upstream_mem_writes.  */
>>>
>>> @@ -720,68 +671,13 @@ rdg_flag_loop_exits (struct graph *rdg, bitmap loops, bitmap partition,
>>>     }
>>>  }
>>>
>>> -/* Flag all the nodes of RDG containing memory accesses that could
>>> -   potentially belong to arrays already accessed in the current
>>> -   PARTITION.  */
>>> -
>>> -static void
>>> -rdg_flag_similar_memory_accesses (struct graph *rdg, bitmap partition,
>>> -                                 bitmap loops, bitmap processed,
>>> -                                 VEC (int, heap) **other_stores)
>>> -{
>>> -  bool foo;
>>> -  unsigned i, n;
>>> -  int j, k, kk;
>>> -  bitmap_iterator ii;
>>> -  struct graph_edge *e;
>>> -
>>> -  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, ii)
>>> -    if (RDG_MEM_WRITE_STMT (rdg, i)
>>> -       || RDG_MEM_READS_STMT (rdg, i))
>>> -      {
>>> -       for (j = 0; j < rdg->n_vertices; j++)
>>> -         if (!bitmap_bit_p (processed, j)
>>> -             && (RDG_MEM_WRITE_STMT (rdg, j)
>>> -                 || RDG_MEM_READS_STMT (rdg, j))
>>> -             && rdg_has_similar_memory_accesses (rdg, i, j))
>>> -           {
>>> -             /* Flag first the node J itself, and all the nodes that
>>> -                are needed to compute J.  */
>>> -             rdg_flag_vertex_and_dependent (rdg, j, partition, loops,
>>> -                                            processed, &foo);
>>> -
>>> -             /* When J is a read, we want to coalesce in the same
>>> -                PARTITION all the nodes that are using J: this is
>>> -                needed for better cache locality.  */
>>> -             rdg_flag_all_uses (rdg, j, partition, loops, processed, &foo);
>>> -
>>> -             /* Remove from OTHER_STORES the vertex that we flagged.  */
>>> -             if (RDG_MEM_WRITE_STMT (rdg, j))
>>> -               for (k = 0; VEC_iterate (int, *other_stores, k, kk); k++)
>>> -                 if (kk == j)
>>> -                   {
>>> -                     VEC_unordered_remove (int, *other_stores, k);
>>> -                     break;
>>> -                   }
>>> -           }
>>> -
>>> -       /* If the node I has two uses, then keep these together in the
>>> -          same PARTITION.  */
>>> -       for (n = 0, e = rdg->vertices[i].succ; e; e = e->succ_next, n++);
>>> -
>>> -       if (n > 1)
>>> -         rdg_flag_all_uses (rdg, i, partition, loops, processed, &foo);
>>> -      }
>>> -}
>>> -
>>>  /* Returns a bitmap in which all the statements needed for computing
>>>    the strongly connected component C of the RDG are flagged, also
>>>    including the loop exit conditions.  */
>>>
>>>  static bitmap
>>>  build_rdg_partition_for_component (struct graph *rdg, rdgc c,
>>> -                                  bool *part_has_writes,
>>> -                                  VEC (int, heap) **other_stores)
>>> +                                  bool *part_has_writes)
>>>  {
>>>   int i, v;
>>>   bitmap partition = BITMAP_ALLOC (NULL);
>>> @@ -793,13 +689,6 @@ build_rdg_partition_for_component (struct graph *rdg, rdgc c,
>>>       rdg_flag_vertex_and_dependent (rdg, v, partition, loops, processed,
>>>                                     part_has_writes);
>>>
>>> -  /* Also iterate on the array of stores not in the starting vertices,
>>> -     and determine those vertices that have some memory affinity with
>>> -     the current nodes in the component: these are stores to the same
>>> -     arrays, i.e. we're taking care of cache locality.  */
>>> -  rdg_flag_similar_memory_accesses (rdg, partition, loops, processed,
>>> -                                   other_stores);
>>> -
>>>   rdg_flag_loop_exits (rdg, loops, partition, processed, part_has_writes);
>>>
>>>   BITMAP_FREE (processed);
>>> @@ -863,6 +752,79 @@ rdg_build_components (struct graph *rdg, VEC (int, heap) *starting_vertices,
>>>   BITMAP_FREE (saved_components);
>>>  }
>>>
>>> +/* Returns true when it is possible to generate a builtin pattern for
>>> +   the PARTITION of RDG.  For the moment we detect only the memset
>>> +   zero pattern.  */
>>> +
>>> +static bool
>>> +can_generate_builtin (struct graph *rdg, bitmap partition)
>>> +{
>>> +  unsigned i;
>>> +  bitmap_iterator bi;
>>> +  int nb_reads = 0;
>>> +  int nb_writes = 0;
>>> +  int stores_zero = 0;
>>> +
>>> +  EXECUTE_IF_SET_IN_BITMAP (partition, 0, i, bi)
>>> +    if (RDG_MEM_READS_STMT (rdg, i))
>>> +      nb_reads++;
>>> +    else if (RDG_MEM_WRITE_STMT (rdg, i))
>>> +      {
>>> +       nb_writes++;
>>> +       if (stmt_with_adjacent_zero_store_dr_p (RDG_STMT (rdg, i)))
>>> +         stores_zero++;
>>> +      }
>>> +
>>> +  return stores_zero == 1 && nb_writes == 1 && nb_reads == 0;
>>> +}
>>> +
>>> +/* Returns true when PARTITION1 and PARTITION2 have similar memory
>>> +   accesses in RDG.  */
>>> +
>>> +static bool
>>> +similar_memory_accesses (struct graph *rdg, bitmap partition1,
>>> +                        bitmap partition2)
>>> +{
>>> +  unsigned i, j;
>>> +  bitmap_iterator bi, bj;
>>> +
>>> +  EXECUTE_IF_SET_IN_BITMAP (partition1, 0, i, bi)
>>> +    if (RDG_MEM_WRITE_STMT (rdg, i)
>>> +       || RDG_MEM_READS_STMT (rdg, i))
>>> +      EXECUTE_IF_SET_IN_BITMAP (partition2, 0, j, bj)
>>> +       if (RDG_MEM_WRITE_STMT (rdg, j)
>>> +           || RDG_MEM_READS_STMT (rdg, j))
>>> +         if (rdg_has_similar_memory_accesses (rdg, i, j))
>>> +           return true;
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Fuse all the partitions from PARTITIONS that contain similar memory
>>> +   references, i.e., we're taking care of cache locality.  This
>>> +   function does not fuse those partitions that contain patterns that
>>> +   can be code generated with builtins.  */
>>> +
>>> +static void
>>> +fuse_partitions_with_similar_memory_accesses (struct graph *rdg,
>>> +                                             VEC (bitmap, heap) **partitions)
>>> +{
>>> +  int p1, p2;
>>> +  bitmap partition1, partition2;
>>> +
>>> +  for (p1 = 0; VEC_iterate (bitmap, *partitions, p1, partition1); p1++)
>>> +    if (!can_generate_builtin (rdg, partition1))
>>> +      for (p2 = 0; VEC_iterate (bitmap, *partitions, p2, partition2); p2++)
>>> +       if (p1 != p2
>>> +           && !can_generate_builtin (rdg, partition2)
>>> +           && similar_memory_accesses (rdg, partition1, partition2))
>>> +         {
>>> +           bitmap_ior_into (partition1, partition2);
>>> +           VEC_ordered_remove (bitmap, *partitions, p2);
>>> +           p2--;
>>> +         }
>>> +}
>>> +
>>>  /* Aggregate several components into a useful partition that is
>>>    registered in the PARTITIONS vector.  Partitions will be
>>>    distributed in different loops.  */
>>> @@ -885,8 +847,7 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
>>>       if (bitmap_bit_p (processed, v))
>>>        continue;
>>>
>>> -      np = build_rdg_partition_for_component (rdg, x, &part_has_writes,
>>> -                                             other_stores);
>>> +      np = build_rdg_partition_for_component (rdg, x, &part_has_writes);
>>>       bitmap_ior_into (partition, np);
>>>       bitmap_ior_into (processed, np);
>>>       BITMAP_FREE (np);
>>> @@ -932,6 +893,8 @@ rdg_build_partitions (struct graph *rdg, VEC (rdgc, heap) *components,
>>>     VEC_safe_push (bitmap, heap, *partitions, partition);
>>>   else
>>>     BITMAP_FREE (partition);
>>> +
>>> +  fuse_partitions_with_similar_memory_accesses (rdg, partitions);
>>>  }
>>>
>>>  /* Dump to FILE the PARTITIONS.  */
>>> --
>>> 1.7.1
>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-12-18 20:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-10  0:38 [PATCH] Fix PR43023: fuse_partitions_with_similar_memory_accesses Sebastian Pop
2010-12-10 13:13 ` Richard Guenther
2010-12-10 19:45   ` Sebastian Pop
2010-12-10 19:53     ` Sebastian Pop
2010-12-13 21:06       ` Sebastian Pop
2010-12-16 21:56         ` Sebastian Pop
2010-12-18 20:50           ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).