public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Loop distribution for memset zero
@ 2010-07-30 20:41 Sebastian Pop
  2010-07-30 20:43 ` [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-07-30 20:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: matz, Sebastian Pop

Hi,

Michael Matz proposed that it would be a good idea for some CPU2006
benchmarks to add a separate heuristic for the loop distribution pass
for the memset zero pattern, and to enable that at -O3 in order to
exercise the loop distribution code.  The following two patches
implement on top of the current loop distribution pass the heuristic,
and enable it at -O3.

The new pass starts by adding to the partitions working list the data
references that are initialized to zero.  These partitions are then
code generated in different loops, and the current loop distribution
detects the memset zero pattern.

Regstrapped on amd64-linux.

SPEC2006 passed with -O3 (except the dealII compile fail that I
haven't fixed in my sources yet...).

Bootstrap failed with BOOT_CFLAGS="-g -O3", but then when I tried also
without these two patches it also failed with the same miscompiled
files, so bootstrap of trunk is broken at -O3, see
http://gcc.gnu.org/PR45146

Ok for trunk?

Thanks,
Sebastian

  Add pass_loop_distribute_memset_zero.
  Enable flag_tree_loop_distribute_memset_zero at -O3.

 gcc/common.opt               |    4 ++
 gcc/doc/invoke.texi          |   23 ++++++++++++++-
 gcc/opts.c                   |    1 +
 gcc/passes.c                 |    1 +
 gcc/tree-data-ref.c          |   26 +++++++++++++++++
 gcc/tree-data-ref.h          |    1 +
 gcc/tree-loop-distribution.c |   63 ++++++++++++++++++++++++++++++++++++++++++
 gcc/tree-pass.h              |    1 +
 8 files changed, 119 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3.
  2010-07-30 20:41 [PATCH 0/2] Loop distribution for memset zero Sebastian Pop
@ 2010-07-30 20:43 ` Sebastian Pop
  2010-07-30 20:52 ` [PATCH 1/2] Add pass_loop_distribute_memset_zero Sebastian Pop
  2010-07-31 11:35 ` [PATCH 0/2] Loop distribution for memset zero Richard Guenther
  2 siblings, 0 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-07-30 20:43 UTC (permalink / raw)
  To: gcc-patches; +Cc: matz, Sebastian Pop

---
 gcc/doc/invoke.texi |    1 +
 gcc/opts.c          |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2d61382..ca3238c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6944,6 +6944,7 @@ DO I = 1, N
 ENDDO
 @end smallexample
 and the initialization loop is transformed into a call to memset zero.
+This flag is enabled by default at @option{-O3}.
 
 @item -ftree-loop-im
 @opindex ftree-loop-im
diff --git a/gcc/opts.c b/gcc/opts.c
index 07d7a23..16a337c 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -862,6 +862,7 @@ decode_options (unsigned int argc, const char **argv,
 
   /* -O3 optimizations.  */
   opt3 = (optimize >= 3);
+  flag_tree_loop_distribute_memset_zero = opt3;
   flag_predictive_commoning = opt3;
   flag_inline_functions = opt3;
   flag_unswitch_loops = opt3;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] Add pass_loop_distribute_memset_zero.
  2010-07-30 20:41 [PATCH 0/2] Loop distribution for memset zero Sebastian Pop
  2010-07-30 20:43 ` [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop
@ 2010-07-30 20:52 ` Sebastian Pop
  2010-07-31 11:35 ` [PATCH 0/2] Loop distribution for memset zero Richard Guenther
  2 siblings, 0 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-07-30 20:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: matz, Sebastian Pop

---
 gcc/common.opt               |    4 ++
 gcc/doc/invoke.texi          |   22 ++++++++++++++-
 gcc/passes.c                 |    1 +
 gcc/tree-data-ref.c          |   26 +++++++++++++++++
 gcc/tree-data-ref.h          |    1 +
 gcc/tree-loop-distribution.c |   63 ++++++++++++++++++++++++++++++++++++++++++
 gcc/tree-pass.h              |    1 +
 7 files changed, 117 insertions(+), 1 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 41a9838..77cf58e 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1337,6 +1337,10 @@ ftree-loop-distribution
 Common Report Var(flag_tree_loop_distribution) Optimization
 Enable loop distribution on trees
 
+ftree-loop-distribute-memset-zero
+Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization
+Enable loop distribution of initialization loops using memset zero
+
 ftree-loop-im
 Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 73051de..2d61382 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
--ftree-phiprop -ftree-loop-distribution @gol
+-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
 -ftree-sink -ftree-sra -ftree-switch-conversion @gol
@@ -6925,6 +6925,26 @@ DO I = 1, N
 ENDDO
 @end smallexample
 
+@item -ftree-loop-distribute-memset-zero
+Perform loop distribution of initialization loops and code generate
+them with a call to memset zero.  For example, the loop
+@smallexample
+DO I = 1, N
+  A(I) = 0
+  B(I) = A(I) + I
+ENDDO
+@end smallexample
+is transformed to
+@smallexample
+DO I = 1, N
+   A(I) = 0
+ENDDO
+DO I = 1, N
+   B(I) = A(I) + I
+ENDDO
+@end smallexample
+and the initialization loop is transformed into a call to memset zero.
+
 @item -ftree-loop-im
 @opindex ftree-loop-im
 Perform loop invariant motion on trees.  This pass moves only invariants that
diff --git a/gcc/passes.c b/gcc/passes.c
index 72e9b5a..7aed7e2 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -897,6 +897,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_scev_cprop);
 	  NEXT_PASS (pass_record_bounds);
 	  NEXT_PASS (pass_check_data_deps);
+	  NEXT_PASS (pass_loop_distribute_memset_zero);
 	  NEXT_PASS (pass_loop_distribution);
 	  NEXT_PASS (pass_linear_transform);
 	  NEXT_PASS (pass_graphite_transforms);
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e7aa277..2656350 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -5038,6 +5038,32 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
   free (bbs);
 }
 
+/* Initialize STMTS with all the statements of LOOP that contain a
+   store to memory of the form "A[i] = 0".  */
+
+void
+stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
+{
+  unsigned int i;
+  basic_block bb;
+  gimple_stmt_iterator si;
+  gimple stmt;
+  tree op;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+
+  for (i = 0; i < loop->num_nodes; i++)
+    for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+      if ((stmt = gsi_stmt (si))
+	  && gimple_vdef (stmt)
+	  && is_gimple_assign (stmt)
+	  && gimple_assign_rhs_code (stmt) == INTEGER_CST
+	  && (op = gimple_assign_rhs1 (stmt))
+	  && (integer_zerop (op) || real_zerop (op)))
+	VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si));
+
+  free (bbs);
+}
+
 /* For a data reference REF, return the declaration of its base
    address or NULL_TREE if the base is not determined.  */
 
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index eff5348..9e18e26 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -564,6 +564,7 @@ index_in_loop_nest (int var, VEC (loop_p, heap) *loop_nest)
 }
 
 void stores_from_loop (struct loop *, VEC (gimple, heap) **);
+void stores_zero_from_loop (struct loop *, VEC (gimple, heap) **);
 void remove_similar_memory_refs (VEC (gimple, heap) **);
 bool rdg_defs_used_in_other_loops_p (struct graph *, int);
 bool have_similar_memory_accesses (gimple, gimple);
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 099a7fe..abc2ac9 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1238,3 +1238,66 @@ struct gimple_opt_pass pass_loop_distribution =
   TODO_dump_func                /* todo_flags_finish */
  }
 };
+
+/* Distribute all the loops containing initializations to zero.  */
+
+static unsigned int
+tree_loop_distribute_memset_zero (void)
+{
+  struct loop *loop;
+  loop_iterator li;
+
+  FOR_EACH_LOOP (li, loop, 0)
+    {
+      VEC (gimple, heap) *work_list = VEC_alloc (gimple, heap, 3);
+      int nb_generated_loops = 0;
+
+      /* With the following working list, we're asking distribute_loop
+	 to separate from the rest of the loop the stores of the form
+	 "A[i] = 0".  */
+      stores_zero_from_loop (loop, &work_list);
+
+      if (VEC_length (gimple, work_list) > 0)
+	nb_generated_loops = distribute_loop (loop, work_list);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  if (nb_generated_loops > 1)
+	    fprintf (dump_file, "Loop %d distributed: split to %d loops.\n",
+		     loop->num, nb_generated_loops);
+	  else
+	    fprintf (dump_file, "Loop %d is the same.\n", loop->num);
+	}
+
+      verify_loop_structure ();
+
+      VEC_free (gimple, heap, work_list);
+    }
+
+  return 0;
+}
+
+static bool
+gate_ldist_memset_zero (void)
+{
+  return flag_tree_loop_distribute_memset_zero != 0;
+}
+
+struct gimple_opt_pass pass_loop_distribute_memset_zero =
+{
+ {
+  GIMPLE_PASS,
+  "ldist-memset-zero",		/* name */
+  gate_ldist_memset_zero,  	/* gate */
+  tree_loop_distribute_memset_zero,       /* execute */
+  NULL,				/* sub */
+  NULL,				/* next */
+  0,				/* static_pass_number */
+  TV_TREE_LOOP_DISTRIBUTION,    /* tv_id */
+  PROP_cfg | PROP_ssa,		/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  TODO_dump_func                /* todo_flags_finish */
+ }
+};
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index c72d7cf..3384aad 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -375,6 +375,7 @@ extern struct gimple_opt_pass pass_empty_loop;
 extern struct gimple_opt_pass pass_record_bounds;
 extern struct gimple_opt_pass pass_graphite_transforms;
 extern struct gimple_opt_pass pass_if_conversion;
+extern struct gimple_opt_pass pass_loop_distribute_memset_zero;
 extern struct gimple_opt_pass pass_loop_distribution;
 extern struct gimple_opt_pass pass_vectorize;
 extern struct gimple_opt_pass pass_slp_vectorize;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-30 20:41 [PATCH 0/2] Loop distribution for memset zero Sebastian Pop
  2010-07-30 20:43 ` [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop
  2010-07-30 20:52 ` [PATCH 1/2] Add pass_loop_distribute_memset_zero Sebastian Pop
@ 2010-07-31 11:35 ` Richard Guenther
  2010-07-31 15:28   ` Sebastian Pop
  2 siblings, 1 reply; 24+ messages in thread
From: Richard Guenther @ 2010-07-31 11:35 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

On Fri, Jul 30, 2010 at 10:40 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Hi,
>
> Michael Matz proposed that it would be a good idea for some CPU2006
> benchmarks to add a separate heuristic for the loop distribution pass
> for the memset zero pattern, and to enable that at -O3 in order to
> exercise the loop distribution code.  The following two patches
> implement on top of the current loop distribution pass the heuristic,
> and enable it at -O3.
>
> The new pass starts by adding to the partitions working list the data
> references that are initialized to zero.  These partitions are then
> code generated in different loops, and the current loop distribution
> detects the memset zero pattern.
>
> Regstrapped on amd64-linux.
>
> SPEC2006 passed with -O3 (except the dealII compile fail that I
> haven't fixed in my sources yet...).
>
> Bootstrap failed with BOOT_CFLAGS="-g -O3", but then when I tried also
> without these two patches it also failed with the same miscompiled
> files, so bootstrap of trunk is broken at -O3, see
> http://gcc.gnu.org/PR45146
>
> Ok for trunk?

The new pass should be disabled when loop-distribution is enabled, no?
Thus, I think it would make more sense to fold it into the existing pass
which then runs in different modes depending on the flags used.

The flag should be named more general, like -ftree-loop-distribute-patterns
as we probably want to add memcpy or array sin/cos operations as well
here.

Now the code looks very specific at the moment, with
stores_zero_from_loop.  I suppose we can't ask loop distribution
to separate stores as is but then only generate separate code for
the memset and ask it to keep the other pieces together?

Thanks,
Richard.

> Thanks,
> Sebastian
>
>  Add pass_loop_distribute_memset_zero.
>  Enable flag_tree_loop_distribute_memset_zero at -O3.
>
>  gcc/common.opt               |    4 ++
>  gcc/doc/invoke.texi          |   23 ++++++++++++++-
>  gcc/opts.c                   |    1 +
>  gcc/passes.c                 |    1 +
>  gcc/tree-data-ref.c          |   26 +++++++++++++++++
>  gcc/tree-data-ref.h          |    1 +
>  gcc/tree-loop-distribution.c |   63 ++++++++++++++++++++++++++++++++++++++++++
>  gcc/tree-pass.h              |    1 +
>  8 files changed, 119 insertions(+), 1 deletions(-)
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] Add pass_loop_distribute_memset_zero.
  2010-07-31 15:28   ` Sebastian Pop
@ 2010-07-31 15:28     ` Sebastian Pop
  2010-07-31 15:28     ` [PATCH 0/2] Loop distribution for memset zero Sebastian Pop
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-07-31 15:28 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop

---
 gcc/common.opt               |    4 +++
 gcc/doc/invoke.texi          |   22 +++++++++++++++++++-
 gcc/tree-data-ref.c          |   26 ++++++++++++++++++++++++
 gcc/tree-data-ref.h          |    1 +
 gcc/tree-loop-distribution.c |   45 +++++++++++++++++++++++++++++------------
 5 files changed, 84 insertions(+), 14 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 41a9838..77cf58e 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1337,6 +1337,10 @@ ftree-loop-distribution
 Common Report Var(flag_tree_loop_distribution) Optimization
 Enable loop distribution on trees
 
+ftree-loop-distribute-memset-zero
+Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization
+Enable loop distribution of initialization loops using memset zero
+
 ftree-loop-im
 Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 73051de..2d61382 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
--ftree-phiprop -ftree-loop-distribution @gol
+-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
 -ftree-sink -ftree-sra -ftree-switch-conversion @gol
@@ -6925,6 +6925,26 @@ DO I = 1, N
 ENDDO
 @end smallexample
 
+@item -ftree-loop-distribute-memset-zero
+Perform loop distribution of initialization loops and code generate
+them with a call to memset zero.  For example, the loop
+@smallexample
+DO I = 1, N
+  A(I) = 0
+  B(I) = A(I) + I
+ENDDO
+@end smallexample
+is transformed to
+@smallexample
+DO I = 1, N
+   A(I) = 0
+ENDDO
+DO I = 1, N
+   B(I) = A(I) + I
+ENDDO
+@end smallexample
+and the initialization loop is transformed into a call to memset zero.
+
 @item -ftree-loop-im
 @opindex ftree-loop-im
 Perform loop invariant motion on trees.  This pass moves only invariants that
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e7aa277..2656350 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -5038,6 +5038,32 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
   free (bbs);
 }
 
+/* Initialize STMTS with all the statements of LOOP that contain a
+   store to memory of the form "A[i] = 0".  */
+
+void
+stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
+{
+  unsigned int i;
+  basic_block bb;
+  gimple_stmt_iterator si;
+  gimple stmt;
+  tree op;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+
+  for (i = 0; i < loop->num_nodes; i++)
+    for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+      if ((stmt = gsi_stmt (si))
+	  && gimple_vdef (stmt)
+	  && is_gimple_assign (stmt)
+	  && gimple_assign_rhs_code (stmt) == INTEGER_CST
+	  && (op = gimple_assign_rhs1 (stmt))
+	  && (integer_zerop (op) || real_zerop (op)))
+	VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si));
+
+  free (bbs);
+}
+
 /* For a data reference REF, return the declaration of its base
    address or NULL_TREE if the base is not determined.  */
 
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index eff5348..9e18e26 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -564,6 +564,7 @@ index_in_loop_nest (int var, VEC (loop_p, heap) *loop_nest)
 }
 
 void stores_from_loop (struct loop *, VEC (gimple, heap) **);
+void stores_zero_from_loop (struct loop *, VEC (gimple, heap) **);
 void remove_similar_memory_refs (VEC (gimple, heap) **);
 bool rdg_defs_used_in_other_loops_p (struct graph *, int);
 bool have_similar_memory_accesses (gimple, gimple);
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 099a7fe..920f744 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1184,18 +1184,36 @@ tree_loop_distribution (void)
     {
       VEC (gimple, heap) *work_list = VEC_alloc (gimple, heap, 3);
 
-      /* With the following working list, we're asking distribute_loop
-	 to separate the stores of the loop: when dependences allow,
-	 it will end on having one store per loop.  */
-      stores_from_loop (loop, &work_list);
-
-      /* A simple heuristic for cache locality is to not split stores
-	 to the same array.  Without this call, an unrolled loop would
-	 be split into as many loops as unroll factor, each loop
-	 storing in the same array.  */
-      remove_similar_memory_refs (&work_list);
-
-      nb_generated_loops = distribute_loop (loop, work_list);
+      /* If both flag_tree_loop_distribute_memset_zero and
+	 flag_tree_loop_distribution are set, then only memset_zero is
+	 executed.  */
+      if (flag_tree_loop_distribute_memset_zero)
+	{
+	  /* With the following working list, we're asking
+	     distribute_loop to separate from the rest of the loop the
+	     stores of the form "A[i] = 0".  */
+	  stores_zero_from_loop (loop, &work_list);
+
+	  /* If there is nothing to be distributed  */
+	  if (VEC_length (gimple, work_list) > 0)
+	    nb_generated_loops = distribute_loop (loop, work_list);
+	}
+      else if (flag_tree_loop_distribution)
+	{
+	  /* With the following working list, we're asking
+	     distribute_loop to separate the stores of the loop: when
+	     dependences allow, it will end on having one store per
+	     loop.  */
+	  stores_from_loop (loop, &work_list);
+
+	  /* A simple heuristic for cache locality is to not split
+	     stores to the same array.  Without this call, an unrolled
+	     loop would be split into as many loops as unroll factor,
+	     each loop storing in the same array.  */
+	  remove_similar_memory_refs (&work_list);
+
+	  nb_generated_loops = distribute_loop (loop, work_list);
+	}
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -1217,7 +1235,8 @@ tree_loop_distribution (void)
 static bool
 gate_tree_loop_distribution (void)
 {
-  return flag_tree_loop_distribution != 0;
+  return flag_tree_loop_distribution
+    || flag_tree_loop_distribute_memset_zero;
 }
 
 struct gimple_opt_pass pass_loop_distribution =
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-31 15:28   ` Sebastian Pop
  2010-07-31 15:28     ` [PATCH 1/3] Add pass_loop_distribute_memset_zero Sebastian Pop
@ 2010-07-31 15:28     ` Sebastian Pop
  2010-07-31 15:31     ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-07-31 15:28 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop

Hi,

here is the updated patch set.  Ok for trunk after regstrap?

Thanks,
Sebastian

  Add pass_loop_distribute_memset_zero.
  Enable flag_tree_loop_distribute_memset_zero at -O3.
  Add -ftree-loop-distribute-patterns.

 gcc/common.opt               |    8 +++++++
 gcc/doc/invoke.texi          |   28 ++++++++++++++++++++++++-
 gcc/opts.c                   |    1 +
 gcc/tree-data-ref.c          |   26 +++++++++++++++++++++++
 gcc/tree-data-ref.h          |    1 +
 gcc/tree-loop-distribution.c |   46 ++++++++++++++++++++++++++++++-----------
 6 files changed, 96 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-31 11:35 ` [PATCH 0/2] Loop distribution for memset zero Richard Guenther
@ 2010-07-31 15:28   ` Sebastian Pop
  2010-07-31 15:28     ` [PATCH 1/3] Add pass_loop_distribute_memset_zero Sebastian Pop
                       ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-07-31 15:28 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

On Sat, Jul 31, 2010 at 05:01, Richard Guenther
<richard.guenther@gmail.com> wrote:
> The new pass should be disabled when loop-distribution is enabled, no?
> Thus, I think it would make more sense to fold it into the existing pass
> which then runs in different modes depending on the flags used.

I will do that and resubmit the patch.

>
> The flag should be named more general, like -ftree-loop-distribute-patterns

What about adding both -ftree-loop-distribute-patterns and
-ftree-loop-distribute-memset-zero, such that we can control what
patterns are detected, and to enable all these patterns together we'll
have -ftree-loop-distribute-patterns.

> as we probably want to add memcpy or array sin/cos operations as well
> here.

I can imagine the memcpy pattern, but could you please provide an
example for sin/cos patterns?

> Now the code looks very specific at the moment, with
> stores_zero_from_loop.  I suppose we can't ask loop distribution
> to separate stores as is but then only generate separate code for
> the memset and ask it to keep the other pieces together?

That is the intent of the worklist: the worklist contains the roots of
the partitions that have to be generated.  The partitions are then
augmented with only those dependences that form a cycle (SCC) around
the root.  For A[i] = 0, there is no way to aggregate around it
anything else.  The rest of the loop is generated in the default
remaining partition.

Sebastian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/3] Add -ftree-loop-distribute-patterns.
  2010-07-31 15:28   ` Sebastian Pop
  2010-07-31 15:28     ` [PATCH 1/3] Add pass_loop_distribute_memset_zero Sebastian Pop
  2010-07-31 15:28     ` [PATCH 0/2] Loop distribution for memset zero Sebastian Pop
@ 2010-07-31 15:31     ` Sebastian Pop
  2010-07-31 15:45       ` Sebastian Pop
  2010-07-31 15:34     ` [PATCH 2/3] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Sebastian Pop @ 2010-07-31 15:31 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop

---
 gcc/common.opt               |    4 ++++
 gcc/doc/invoke.texi          |    5 +++++
 gcc/tree-loop-distribution.c |    3 ++-
 3 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 77cf58e..a9fcdd2 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1341,6 +1341,10 @@ ftree-loop-distribute-memset-zero
 Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization
 Enable loop distribution of initialization loops using memset zero
 
+ftree-loop-distribute-patterns
+Common Report Var(flag_tree_loop_distribute_patterns) Optimization
+Enable loop distribution of patterns code generated with calls to a library
+
 ftree-loop-im
 Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ca3238c..b9b8b22 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -385,6 +385,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol
+-ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
 -ftree-sink -ftree-sra -ftree-switch-conversion @gol
@@ -6946,6 +6947,10 @@ ENDDO
 and the initialization loop is transformed into a call to memset zero.
 This flag is enabled by default at @option{-O3}.
 
+@item -ftree-loop-distribute-patterns
+Perform loop distribution of patterns that can be code generated with
+calls to a library.  This enables @option{-ftree-loop-distribute-memset-zero}.
+
 @item -ftree-loop-im
 @opindex ftree-loop-im
 Perform loop invariant motion on trees.  This pass moves only invariants that
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 920f744..c677ecb 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1187,7 +1187,8 @@ tree_loop_distribution (void)
       /* If both flag_tree_loop_distribute_memset_zero and
 	 flag_tree_loop_distribution are set, then only memset_zero is
 	 executed.  */
-      if (flag_tree_loop_distribute_memset_zero)
+      if (flag_tree_loop_distribute_memset_zero
+	  || flag_tree_loop_distribute_patterns)
 	{
 	  /* With the following working list, we're asking
 	     distribute_loop to separate from the rest of the loop the
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/3] Enable flag_tree_loop_distribute_memset_zero at -O3.
  2010-07-31 15:28   ` Sebastian Pop
                       ` (2 preceding siblings ...)
  2010-07-31 15:31     ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop
@ 2010-07-31 15:34     ` Sebastian Pop
  2010-07-31 17:21     ` [PATCH 0/2] Loop distribution for memset zero Michael Matz
  2010-08-01 12:19     ` Richard Guenther
  5 siblings, 0 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-07-31 15:34 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop

---
 gcc/doc/invoke.texi |    1 +
 gcc/opts.c          |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2d61382..ca3238c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6944,6 +6944,7 @@ DO I = 1, N
 ENDDO
 @end smallexample
 and the initialization loop is transformed into a call to memset zero.
+This flag is enabled by default at @option{-O3}.
 
 @item -ftree-loop-im
 @opindex ftree-loop-im
diff --git a/gcc/opts.c b/gcc/opts.c
index 07d7a23..16a337c 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -862,6 +862,7 @@ decode_options (unsigned int argc, const char **argv,
 
   /* -O3 optimizations.  */
   opt3 = (optimize >= 3);
+  flag_tree_loop_distribute_memset_zero = opt3;
   flag_predictive_commoning = opt3;
   flag_inline_functions = opt3;
   flag_unswitch_loops = opt3;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] Add -ftree-loop-distribute-patterns.
  2010-07-31 15:31     ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop
@ 2010-07-31 15:45       ` Sebastian Pop
  2010-08-01 12:06         ` Richard Guenther
  0 siblings, 1 reply; 24+ messages in thread
From: Sebastian Pop @ 2010-07-31 15:45 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop

I forgot this part in the patch below:

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index c677ecb..34d6e21 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1237,7 +1237,8 @@ static bool
 gate_tree_loop_distribution (void)
 {
   return flag_tree_loop_distribution
-    || flag_tree_loop_distribute_memset_zero;
+    || flag_tree_loop_distribute_memset_zero
+    || flag_tree_loop_distribute_patterns;
 }

 struct gimple_opt_pass pass_loop_distribution =


On Sat, Jul 31, 2010 at 10:27, Sebastian Pop <sebpop@gmail.com> wrote:
> ---
>  gcc/common.opt               |    4 ++++
>  gcc/doc/invoke.texi          |    5 +++++
>  gcc/tree-loop-distribution.c |    3 ++-
>  3 files changed, 11 insertions(+), 1 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 77cf58e..a9fcdd2 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1341,6 +1341,10 @@ ftree-loop-distribute-memset-zero
>  Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization
>  Enable loop distribution of initialization loops using memset zero
>
> +ftree-loop-distribute-patterns
> +Common Report Var(flag_tree_loop_distribute_patterns) Optimization
> +Enable loop distribution of patterns code generated with calls to a library
> +
>  ftree-loop-im
>  Common Report Var(flag_tree_loop_im) Init(1) Optimization
>  Enable loop invariant motion on trees
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ca3238c..b9b8b22 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -385,6 +385,7 @@ Objective-C and Objective-C++ Dialects}.
>  -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
>  -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
>  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol
> +-ftree-loop-distribute-patterns @gol
>  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
>  -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
>  -ftree-sink -ftree-sra -ftree-switch-conversion @gol
> @@ -6946,6 +6947,10 @@ ENDDO
>  and the initialization loop is transformed into a call to memset zero.
>  This flag is enabled by default at @option{-O3}.
>
> +@item -ftree-loop-distribute-patterns
> +Perform loop distribution of patterns that can be code generated with
> +calls to a library.  This enables @option{-ftree-loop-distribute-memset-zero}.
> +
>  @item -ftree-loop-im
>  @opindex ftree-loop-im
>  Perform loop invariant motion on trees.  This pass moves only invariants that
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index 920f744..c677ecb 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -1187,7 +1187,8 @@ tree_loop_distribution (void)
>       /* If both flag_tree_loop_distribute_memset_zero and
>         flag_tree_loop_distribution are set, then only memset_zero is
>         executed.  */
> -      if (flag_tree_loop_distribute_memset_zero)
> +      if (flag_tree_loop_distribute_memset_zero
> +         || flag_tree_loop_distribute_patterns)
>        {
>          /* With the following working list, we're asking
>             distribute_loop to separate from the rest of the loop the
> --
> 1.7.0.4
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-31 15:28   ` Sebastian Pop
                       ` (3 preceding siblings ...)
  2010-07-31 15:34     ` [PATCH 2/3] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop
@ 2010-07-31 17:21     ` Michael Matz
  2010-07-31 17:36       ` Sebastian Pop
  2010-07-31 20:01       ` Joseph S. Myers
  2010-08-01 12:19     ` Richard Guenther
  5 siblings, 2 replies; 24+ messages in thread
From: Michael Matz @ 2010-07-31 17:21 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches

Hi,

On Sat, 31 Jul 2010, Sebastian Pop wrote:

> > as we probably want to add memcpy or array sin/cos operations as well 
> > here.
> 
> I can imagine the memcpy pattern, but could you please provide an 
> example for sin/cos patterns?

Some math libraries (the one from AMD at least for instance) provide not 
only vectorized intrinsics for a fixed vector size (e.g. 4 float 
elements), but also for a generic arbitrarily sized array.  
For instance:

  void vrsa_expf(int n, float *src, float *dest);

is equivalent to:

  for (i = 0; i < n; i++)
    dest[i] = expf (src[i]);


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-31 17:21     ` [PATCH 0/2] Loop distribution for memset zero Michael Matz
@ 2010-07-31 17:36       ` Sebastian Pop
  2010-08-01 12:10         ` Richard Guenther
  2010-07-31 20:01       ` Joseph S. Myers
  1 sibling, 1 reply; 24+ messages in thread
From: Sebastian Pop @ 2010-07-31 17:36 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Guenther, gcc-patches

On Sat, Jul 31, 2010 at 11:43, Michael Matz <matz@suse.de> wrote:
> Some math libraries (the one from AMD at least for instance) provide not
> only vectorized intrinsics for a fixed vector size (e.g. 4 float
> elements), but also for a generic arbitrarily sized array.
> For instance:
>
>  void vrsa_expf(int n, float *src, float *dest);
>
> is equivalent to:
>
>  for (i = 0; i < n; i++)
>    dest[i] = expf (src[i]);
>

I see.  I think it would not be difficult to detect this kind of
pattern as well.

I would need some help on the code generation part, as I don't know
how to generate the calls.  For the memset and memcpy we have
BUILT_IN_MEMSET and BUILT_IN_MEMCPY.  How should I
generate code for the vrsa_expf function?

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-31 17:21     ` [PATCH 0/2] Loop distribution for memset zero Michael Matz
  2010-07-31 17:36       ` Sebastian Pop
@ 2010-07-31 20:01       ` Joseph S. Myers
  2010-08-01 16:19         ` Michael Matz
  1 sibling, 1 reply; 24+ messages in thread
From: Joseph S. Myers @ 2010-07-31 20:01 UTC (permalink / raw)
  To: Michael Matz; +Cc: Sebastian Pop, Richard Guenther, gcc-patches

On Sat, 31 Jul 2010, Michael Matz wrote:

> Some math libraries (the one from AMD at least for instance) provide not 
> only vectorized intrinsics for a fixed vector size (e.g. 4 float 
> elements), but also for a generic arbitrarily sized array.  
> For instance:
> 
>   void vrsa_expf(int n, float *src, float *dest);
> 
> is equivalent to:
> 
>   for (i = 0; i < n; i++)
>     dest[i] = expf (src[i]);

Exactly equivalent to that C code even with overlaps, or is it really (int 
n, const float *restrict src, float *restrict dest) with no overlap 
permitted?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] Add -ftree-loop-distribute-patterns.
  2010-07-31 15:45       ` Sebastian Pop
@ 2010-08-01 12:06         ` Richard Guenther
  2010-08-02  9:21           ` Richard Guenther
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Guenther @ 2010-08-01 12:06 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Sat, Jul 31, 2010 at 5:30 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> I forgot this part in the patch below:

I thought of renaming -ftree-loop-distribute-memset-zero to
-ftree-loop-distribute-patterns, not adding an unused option.

Richard.

> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index c677ecb..34d6e21 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -1237,7 +1237,8 @@ static bool
>  gate_tree_loop_distribution (void)
>  {
>   return flag_tree_loop_distribution
> -    || flag_tree_loop_distribute_memset_zero;
> +    || flag_tree_loop_distribute_memset_zero
> +    || flag_tree_loop_distribute_patterns;
>  }
>
>  struct gimple_opt_pass pass_loop_distribution =
>
>
> On Sat, Jul 31, 2010 at 10:27, Sebastian Pop <sebpop@gmail.com> wrote:
>> ---
>>  gcc/common.opt               |    4 ++++
>>  gcc/doc/invoke.texi          |    5 +++++
>>  gcc/tree-loop-distribution.c |    3 ++-
>>  3 files changed, 11 insertions(+), 1 deletions(-)
>>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 77cf58e..a9fcdd2 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1341,6 +1341,10 @@ ftree-loop-distribute-memset-zero
>>  Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization
>>  Enable loop distribution of initialization loops using memset zero
>>
>> +ftree-loop-distribute-patterns
>> +Common Report Var(flag_tree_loop_distribute_patterns) Optimization
>> +Enable loop distribution of patterns code generated with calls to a library
>> +
>>  ftree-loop-im
>>  Common Report Var(flag_tree_loop_im) Init(1) Optimization
>>  Enable loop invariant motion on trees
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index ca3238c..b9b8b22 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -385,6 +385,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
>>  -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
>>  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol
>> +-ftree-loop-distribute-patterns @gol
>>  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
>>  -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
>>  -ftree-sink -ftree-sra -ftree-switch-conversion @gol
>> @@ -6946,6 +6947,10 @@ ENDDO
>>  and the initialization loop is transformed into a call to memset zero.
>>  This flag is enabled by default at @option{-O3}.
>>
>> +@item -ftree-loop-distribute-patterns
>> +Perform loop distribution of patterns that can be code generated with
>> +calls to a library.  This enables @option{-ftree-loop-distribute-memset-zero}.
>> +
>>  @item -ftree-loop-im
>>  @opindex ftree-loop-im
>>  Perform loop invariant motion on trees.  This pass moves only invariants that
>> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
>> index 920f744..c677ecb 100644
>> --- a/gcc/tree-loop-distribution.c
>> +++ b/gcc/tree-loop-distribution.c
>> @@ -1187,7 +1187,8 @@ tree_loop_distribution (void)
>>       /* If both flag_tree_loop_distribute_memset_zero and
>>         flag_tree_loop_distribution are set, then only memset_zero is
>>         executed.  */
>> -      if (flag_tree_loop_distribute_memset_zero)
>> +      if (flag_tree_loop_distribute_memset_zero
>> +         || flag_tree_loop_distribute_patterns)
>>        {
>>          /* With the following working list, we're asking
>>             distribute_loop to separate from the rest of the loop the
>> --
>> 1.7.0.4
>>
>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-31 17:36       ` Sebastian Pop
@ 2010-08-01 12:10         ` Richard Guenther
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Guenther @ 2010-08-01 12:10 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Michael Matz, gcc-patches

On Sat, Jul 31, 2010 at 6:55 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Sat, Jul 31, 2010 at 11:43, Michael Matz <matz@suse.de> wrote:
>> Some math libraries (the one from AMD at least for instance) provide not
>> only vectorized intrinsics for a fixed vector size (e.g. 4 float
>> elements), but also for a generic arbitrarily sized array.
>> For instance:
>>
>>  void vrsa_expf(int n, float *src, float *dest);
>>
>> is equivalent to:
>>
>>  for (i = 0; i < n; i++)
>>    dest[i] = expf (src[i]);
>>
>
> I see.  I think it would not be difficult to detect this kind of
> pattern as well.
>
> I would need some help on the code generation part, as I don't know
> how to generate the calls.  For the memset and memcpy we have
> BUILT_IN_MEMSET and BUILT_IN_MEMCPY.  How should I
> generate code for the vrsa_expf function?

Conditional on -mveclibabi=acml you simply assume the availability
and ABI of acml (you need to build a function type and decl, look
at ix86_veclibabi_acml in i386.c).

Richard.

> Thanks,
> Sebastian
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-31 15:28   ` Sebastian Pop
                       ` (4 preceding siblings ...)
  2010-07-31 17:21     ` [PATCH 0/2] Loop distribution for memset zero Michael Matz
@ 2010-08-01 12:19     ` Richard Guenther
  2010-08-01 15:21       ` Sebastian Pop
  5 siblings, 1 reply; 24+ messages in thread
From: Richard Guenther @ 2010-08-01 12:19 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

On Sat, Jul 31, 2010 at 5:00 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Sat, Jul 31, 2010 at 05:01, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> The new pass should be disabled when loop-distribution is enabled, no?
>> Thus, I think it would make more sense to fold it into the existing pass
>> which then runs in different modes depending on the flags used.
>
> I will do that and resubmit the patch.
>
>>
>> The flag should be named more general, like -ftree-loop-distribute-patterns
>
> What about adding both -ftree-loop-distribute-patterns and
> -ftree-loop-distribute-memset-zero, such that we can control what
> patterns are detected, and to enable all these patterns together we'll
> have -ftree-loop-distribute-patterns.

Hm.  I don't like inflation of command-line arguments too much, but
it might make sense ...

>> as we probably want to add memcpy or array sin/cos operations as well
>> here.
>
> I can imagine the memcpy pattern, but could you please provide an
> example for sin/cos patterns?

See other responses.  Can we detect for eample daxpy?

for (i=0; i<n; ++i)
  dy[i] = dy[i] + da * dx[i];

?  In principle all the blas routines have one destination, so we'd need
to distribute all stores, like with regular loop distribution but then
after analyzing the partitions and detecting which ones we recognize
we need to fuse the unhandled parts together again.  Can we do
that from inside the loop distribution machinery?

The Fortran frontend alread has -fexternal-blas which it uses to
implement matmul on large arrays.

>> Now the code looks very specific at the moment, with
>> stores_zero_from_loop.  I suppose we can't ask loop distribution
>> to separate stores as is but then only generate separate code for
>> the memset and ask it to keep the other pieces together?
>
> That is the intent of the worklist: the worklist contains the roots of
> the partitions that have to be generated.  The partitions are then
> augmented with only those dependences that form a cycle (SCC) around
> the root.  For A[i] = 0, there is no way to aggregate around it
> anything else.  The rest of the loop is generated in the default
> remaining partition.

Ok, so we'd need to do the pattern recognition before distributing
the loop?  But we need to make sure that the partition only contains
side-effects the replacement function has.  Consider

 for (i = 0; i < n ; ++i)
   {
     dx[i] = i;
     dy[i] = dy[i] + da * dx[i];
   }

will loop distribution include the assignment to dx[i] in the partition
if the worklist contains dy[i]?

Richard.

>
> Sebastian
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-08-01 12:19     ` Richard Guenther
@ 2010-08-01 15:21       ` Sebastian Pop
  0 siblings, 0 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-08-01 15:21 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

On Sun, Aug 1, 2010 at 07:19, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Hm.  I don't like inflation of command-line arguments too much, but

Ok, so for now let's have only -ftree-loop-distribute-patterns.

> See other responses.  Can we detect for eample daxpy?
>
> for (i=0; i<n; ++i)
>  dy[i] = dy[i] + da * dx[i];
>

Yes we could detect these patterns.

> ?  In principle all the blas routines have one destination, so we'd need
> to distribute all stores, like with regular loop distribution but then
> after analyzing the partitions and detecting which ones we recognize
> we need to fuse the unhandled parts together again.  Can we do
> that from inside the loop distribution machinery?
>

The loop distribution does not fuse back the partitions if, due to
other dependences we have to pull in the same partition more data
references than in the analysis part, and so the kernel to be code
generated is not exactly the one for which we have the lib function.

I think it would not be difficult to implement the fusion of the
partitions that do not match the patterns with the default partition.

> Ok, so we'd need to do the pattern recognition before distributing
> the loop?

We have to detect the pattern both before we create the partitions, as
this would create the initial root of the partition, and then after
the partition is created by aggregation of other data references, we
have to run the pattern matching again to make sure the pattern
matches again.

>  But we need to make sure that the partition only contains
> side-effects the replacement function has.  Consider
>
>  for (i = 0; i < n ; ++i)
>   {
>     dx[i] = i;
>     dy[i] = dy[i] + da * dx[i];
>   }
>
> will loop distribution include the assignment to dx[i] in the partition
> if the worklist contains dy[i]?

I think that in this case you would get two partitions, because there
is no SCC in the data dep graph that would require the write to dx to
be in the same partition as the write to dy, and the distribution
would lead to:

for (i = 0; i < n ; ++i)
  dx[i] = i;

for (i = 0; i < n ; ++i)
  dy[i] = dy[i] + da * dx[i];

Sebastian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] Loop distribution for memset zero
  2010-07-31 20:01       ` Joseph S. Myers
@ 2010-08-01 16:19         ` Michael Matz
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Matz @ 2010-08-01 16:19 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Sebastian Pop, Richard Guenther, gcc-patches

Hi,

On Sat, 31 Jul 2010, Joseph S. Myers wrote:

> On Sat, 31 Jul 2010, Michael Matz wrote:
> 
> > Some math libraries (the one from AMD at least for instance) provide not 
> > only vectorized intrinsics for a fixed vector size (e.g. 4 float 
> > elements), but also for a generic arbitrarily sized array.  
> > For instance:
> > 
> >   void vrsa_expf(int n, float *src, float *dest);
> > 
> > is equivalent to:
> > 
> >   for (i = 0; i < n; i++)
> >     dest[i] = expf (src[i]);
> 
> Exactly equivalent to that C code even with overlaps, or is it really (int 
> n, const float *restrict src, float *restrict dest) with no overlap 
> permitted?

The current vrsa_expf happens to be implemented with a forward walk 
through the two arrays without checking for overlap.  I think it's safe to 
assume that nobody thought about this aspect and hence the specification 
should include that no partial overlap is permitted (it would work with 
exact overlap).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] Add -ftree-loop-distribute-patterns.
  2010-08-01 12:06         ` Richard Guenther
@ 2010-08-02  9:21           ` Richard Guenther
  2010-08-02 15:26             ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop
  2010-08-02 16:22             ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Guenther @ 2010-08-02  9:21 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Sun, Aug 1, 2010 at 2:05 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Sat, Jul 31, 2010 at 5:30 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>> I forgot this part in the patch below:
>
> I thought of renaming -ftree-loop-distribute-memset-zero to
> -ftree-loop-distribute-patterns, not adding an unused option.

Btw, the patchset is ok with that change, -ftree-loop-distribute-memset
removed and -ftree-loop-distribute-patterns enabled at -O3.

Thanks,
Richard.

> Richard.
>
>> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
>> index c677ecb..34d6e21 100644
>> --- a/gcc/tree-loop-distribution.c
>> +++ b/gcc/tree-loop-distribution.c
>> @@ -1237,7 +1237,8 @@ static bool
>>  gate_tree_loop_distribution (void)
>>  {
>>   return flag_tree_loop_distribution
>> -    || flag_tree_loop_distribute_memset_zero;
>> +    || flag_tree_loop_distribute_memset_zero
>> +    || flag_tree_loop_distribute_patterns;
>>  }
>>
>>  struct gimple_opt_pass pass_loop_distribution =
>>
>>
>> On Sat, Jul 31, 2010 at 10:27, Sebastian Pop <sebpop@gmail.com> wrote:
>>> ---
>>>  gcc/common.opt               |    4 ++++
>>>  gcc/doc/invoke.texi          |    5 +++++
>>>  gcc/tree-loop-distribution.c |    3 ++-
>>>  3 files changed, 11 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index 77cf58e..a9fcdd2 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -1341,6 +1341,10 @@ ftree-loop-distribute-memset-zero
>>>  Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization
>>>  Enable loop distribution of initialization loops using memset zero
>>>
>>> +ftree-loop-distribute-patterns
>>> +Common Report Var(flag_tree_loop_distribute_patterns) Optimization
>>> +Enable loop distribution of patterns code generated with calls to a library
>>> +
>>>  ftree-loop-im
>>>  Common Report Var(flag_tree_loop_im) Init(1) Optimization
>>>  Enable loop invariant motion on trees
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index ca3238c..b9b8b22 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -385,6 +385,7 @@ Objective-C and Objective-C++ Dialects}.
>>>  -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
>>>  -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
>>>  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol
>>> +-ftree-loop-distribute-patterns @gol
>>>  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
>>>  -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
>>>  -ftree-sink -ftree-sra -ftree-switch-conversion @gol
>>> @@ -6946,6 +6947,10 @@ ENDDO
>>>  and the initialization loop is transformed into a call to memset zero.
>>>  This flag is enabled by default at @option{-O3}.
>>>
>>> +@item -ftree-loop-distribute-patterns
>>> +Perform loop distribution of patterns that can be code generated with
>>> +calls to a library.  This enables @option{-ftree-loop-distribute-memset-zero}.
>>> +
>>>  @item -ftree-loop-im
>>>  @opindex ftree-loop-im
>>>  Perform loop invariant motion on trees.  This pass moves only invariants that
>>> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
>>> index 920f744..c677ecb 100644
>>> --- a/gcc/tree-loop-distribution.c
>>> +++ b/gcc/tree-loop-distribution.c
>>> @@ -1187,7 +1187,8 @@ tree_loop_distribution (void)
>>>       /* If both flag_tree_loop_distribute_memset_zero and
>>>         flag_tree_loop_distribution are set, then only memset_zero is
>>>         executed.  */
>>> -      if (flag_tree_loop_distribute_memset_zero)
>>> +      if (flag_tree_loop_distribute_memset_zero
>>> +         || flag_tree_loop_distribute_patterns)
>>>        {
>>>          /* With the following working list, we're asking
>>>             distribute_loop to separate from the rest of the loop the
>>> --
>>> 1.7.0.4
>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3.
  2010-08-02  9:21           ` Richard Guenther
@ 2010-08-02 15:26             ` Sebastian Pop
  2010-08-07 17:49               ` Gerald Pfeifer
  2010-10-21  0:30               ` H.J. Lu
  2010-08-02 16:22             ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop
  1 sibling, 2 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-08-02 15:26 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop

Hi,

Here is the patch that I am testing on amd64-linux.  I will commit
this patch to trunk after regstrap.

Sebastian

---
 gcc/common.opt               |    4 +++
 gcc/doc/invoke.texi          |   25 ++++++++++++++++++++++-
 gcc/opts.c                   |    1 +
 gcc/tree-data-ref.c          |   26 ++++++++++++++++++++++++
 gcc/tree-data-ref.h          |    1 +
 gcc/tree-loop-distribution.c |   45 +++++++++++++++++++++++++++++------------
 6 files changed, 88 insertions(+), 14 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 41a9838..8cb09ab 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1337,6 +1337,10 @@ ftree-loop-distribution
 Common Report Var(flag_tree_loop_distribution) Optimization
 Enable loop distribution on trees
 
+ftree-loop-distribute-patterns
+Common Report Var(flag_tree_loop_distribute_patterns) Optimization
+Enable loop distribution for patterns transformed into a library call
+
 ftree-loop-im
 Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 73051de..68b64db 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
--ftree-phiprop -ftree-loop-distribution @gol
+-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
 -ftree-sink -ftree-sra -ftree-switch-conversion @gol
@@ -6925,6 +6925,29 @@ DO I = 1, N
 ENDDO
 @end smallexample
 
+@item -ftree-loop-distribute-patterns
+Perform loop distribution of patterns that can be code generated with
+calls to a library.  This flag is enabled by default at @option{-O3}.
+
+This pass distributes the initialization loops and generates a call to
+memset zero.  For example, the loop
+@smallexample
+DO I = 1, N
+  A(I) = 0
+  B(I) = A(I) + I
+ENDDO
+@end smallexample
+is transformed to
+@smallexample
+DO I = 1, N
+   A(I) = 0
+ENDDO
+DO I = 1, N
+   B(I) = A(I) + I
+ENDDO
+@end smallexample
+and the initialization loop is transformed into a call to memset zero.
+
 @item -ftree-loop-im
 @opindex ftree-loop-im
 Perform loop invariant motion on trees.  This pass moves only invariants that
diff --git a/gcc/opts.c b/gcc/opts.c
index 07d7a23..2579e9f 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -862,6 +862,7 @@ decode_options (unsigned int argc, const char **argv,
 
   /* -O3 optimizations.  */
   opt3 = (optimize >= 3);
+  flag_tree_loop_distribute_patterns = opt3;
   flag_predictive_commoning = opt3;
   flag_inline_functions = opt3;
   flag_unswitch_loops = opt3;
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index e7aa277..2656350 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -5038,6 +5038,32 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
   free (bbs);
 }
 
+/* Initialize STMTS with all the statements of LOOP that contain a
+   store to memory of the form "A[i] = 0".  */
+
+void
+stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts)
+{
+  unsigned int i;
+  basic_block bb;
+  gimple_stmt_iterator si;
+  gimple stmt;
+  tree op;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+
+  for (i = 0; i < loop->num_nodes; i++)
+    for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+      if ((stmt = gsi_stmt (si))
+	  && gimple_vdef (stmt)
+	  && is_gimple_assign (stmt)
+	  && gimple_assign_rhs_code (stmt) == INTEGER_CST
+	  && (op = gimple_assign_rhs1 (stmt))
+	  && (integer_zerop (op) || real_zerop (op)))
+	VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si));
+
+  free (bbs);
+}
+
 /* For a data reference REF, return the declaration of its base
    address or NULL_TREE if the base is not determined.  */
 
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index eff5348..9e18e26 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -564,6 +564,7 @@ index_in_loop_nest (int var, VEC (loop_p, heap) *loop_nest)
 }
 
 void stores_from_loop (struct loop *, VEC (gimple, heap) **);
+void stores_zero_from_loop (struct loop *, VEC (gimple, heap) **);
 void remove_similar_memory_refs (VEC (gimple, heap) **);
 bool rdg_defs_used_in_other_loops_p (struct graph *, int);
 bool have_similar_memory_accesses (gimple, gimple);
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 099a7fe..5905406 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1184,18 +1184,36 @@ tree_loop_distribution (void)
     {
       VEC (gimple, heap) *work_list = VEC_alloc (gimple, heap, 3);
 
-      /* With the following working list, we're asking distribute_loop
-	 to separate the stores of the loop: when dependences allow,
-	 it will end on having one store per loop.  */
-      stores_from_loop (loop, &work_list);
-
-      /* A simple heuristic for cache locality is to not split stores
-	 to the same array.  Without this call, an unrolled loop would
-	 be split into as many loops as unroll factor, each loop
-	 storing in the same array.  */
-      remove_similar_memory_refs (&work_list);
-
-      nb_generated_loops = distribute_loop (loop, work_list);
+      /* If both flag_tree_loop_distribute_patterns and
+	 flag_tree_loop_distribution are set, then only
+	 distribute_patterns is executed.  */
+      if (flag_tree_loop_distribute_patterns)
+	{
+	  /* With the following working list, we're asking
+	     distribute_loop to separate from the rest of the loop the
+	     stores of the form "A[i] = 0".  */
+	  stores_zero_from_loop (loop, &work_list);
+
+	  /* Do nothing if there are no patterns to be distributed.  */
+	  if (VEC_length (gimple, work_list) > 0)
+	    nb_generated_loops = distribute_loop (loop, work_list);
+	}
+      else if (flag_tree_loop_distribution)
+	{
+	  /* With the following working list, we're asking
+	     distribute_loop to separate the stores of the loop: when
+	     dependences allow, it will end on having one store per
+	     loop.  */
+	  stores_from_loop (loop, &work_list);
+
+	  /* A simple heuristic for cache locality is to not split
+	     stores to the same array.  Without this call, an unrolled
+	     loop would be split into as many loops as unroll factor,
+	     each loop storing in the same array.  */
+	  remove_similar_memory_refs (&work_list);
+
+	  nb_generated_loops = distribute_loop (loop, work_list);
+	}
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -1217,7 +1235,8 @@ tree_loop_distribution (void)
 static bool
 gate_tree_loop_distribution (void)
 {
-  return flag_tree_loop_distribution != 0;
+  return flag_tree_loop_distribution
+    || flag_tree_loop_distribute_patterns;
 }
 
 struct gimple_opt_pass pass_loop_distribution =
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] Add -ftree-loop-distribute-patterns.
  2010-08-02  9:21           ` Richard Guenther
  2010-08-02 15:26             ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop
@ 2010-08-02 16:22             ` Sebastian Pop
  1 sibling, 0 replies; 24+ messages in thread
From: Sebastian Pop @ 2010-08-02 16:22 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On Mon, Aug 2, 2010 at 04:21, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Btw, the patchset is ok with that change, -ftree-loop-distribute-memset
> removed and -ftree-loop-distribute-patterns enabled at -O3.
>

Committed r162822.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3.
  2010-08-02 15:26             ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop
@ 2010-08-07 17:49               ` Gerald Pfeifer
  2010-10-21  0:30               ` H.J. Lu
  1 sibling, 0 replies; 24+ messages in thread
From: Gerald Pfeifer @ 2010-08-07 17:49 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Mon, 2 Aug 2010, Sebastian Pop wrote:
> Here is the patch that I am testing on amd64-linux.  I will commit
> this patch to trunk after regstrap.

I think this would be worthwhile to add to 
http://gcc.gnu.org/gcc-4.6/changes.html ?

Gerald

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3.
  2010-08-02 15:26             ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop
  2010-08-07 17:49               ` Gerald Pfeifer
@ 2010-10-21  0:30               ` H.J. Lu
  2010-12-18 17:48                 ` H.J. Lu
  1 sibling, 1 reply; 24+ messages in thread
From: H.J. Lu @ 2010-10-21  0:30 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches

On Mon, Aug 2, 2010 at 8:25 AM, Sebastian Pop <sebpop@gmail.com> wrote:
> Hi,
>
> Here is the patch that I am testing on amd64-linux.  I will commit
> this patch to trunk after regstrap.
>
> Sebastian
>

This new option caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46107

-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3.
  2010-10-21  0:30               ` H.J. Lu
@ 2010-12-18 17:48                 ` H.J. Lu
  0 siblings, 0 replies; 24+ messages in thread
From: H.J. Lu @ 2010-12-18 17:48 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches

On Wed, Oct 20, 2010 at 4:06 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 2, 2010 at 8:25 AM, Sebastian Pop <sebpop@gmail.com> wrote:
>> Hi,
>>
>> Here is the patch that I am testing on amd64-linux.  I will commit
>> this patch to trunk after regstrap.
>>
>> Sebastian
>>
>
> This new option caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46107
>

This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47002

-- 
H.J.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2010-12-18 14:27 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-30 20:41 [PATCH 0/2] Loop distribution for memset zero Sebastian Pop
2010-07-30 20:43 ` [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop
2010-07-30 20:52 ` [PATCH 1/2] Add pass_loop_distribute_memset_zero Sebastian Pop
2010-07-31 11:35 ` [PATCH 0/2] Loop distribution for memset zero Richard Guenther
2010-07-31 15:28   ` Sebastian Pop
2010-07-31 15:28     ` [PATCH 1/3] Add pass_loop_distribute_memset_zero Sebastian Pop
2010-07-31 15:28     ` [PATCH 0/2] Loop distribution for memset zero Sebastian Pop
2010-07-31 15:31     ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop
2010-07-31 15:45       ` Sebastian Pop
2010-08-01 12:06         ` Richard Guenther
2010-08-02  9:21           ` Richard Guenther
2010-08-02 15:26             ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop
2010-08-07 17:49               ` Gerald Pfeifer
2010-10-21  0:30               ` H.J. Lu
2010-12-18 17:48                 ` H.J. Lu
2010-08-02 16:22             ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop
2010-07-31 15:34     ` [PATCH 2/3] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop
2010-07-31 17:21     ` [PATCH 0/2] Loop distribution for memset zero Michael Matz
2010-07-31 17:36       ` Sebastian Pop
2010-08-01 12:10         ` Richard Guenther
2010-07-31 20:01       ` Joseph S. Myers
2010-08-01 16:19         ` Michael Matz
2010-08-01 12:19     ` Richard Guenther
2010-08-01 15:21       ` Sebastian Pop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).