[RFC, PR66873] Use graphite for parloops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC, PR66873] Use graphite for parloops
@ 2015-07-15 22:18 Tom de Vries
  2015-07-16  8:48 ` Richard Biener
  2015-07-20 18:54 ` Sebastian Pop
  0 siblings, 2 replies; 27+ messages in thread
From: Tom de Vries @ 2015-07-15 22:18 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2258 bytes --]

Hi,

I tried to parallelize this fortran test-case (based on 
autopar/outer-1.c), specifically the outer loop of the first loop nest 
using -ftree-parallelize-loops=2:
...
program main
   implicit none
   integer, parameter         :: n = 500
   integer, dimension (0:n-1, 0:n-1) :: x
   integer                    :: i, j, ii, jj


   do ii = 0, n - 1
      do jj = 0, n - 1
         x(jj, ii) = ii + jj + 3
      end do
   end do

   do i = 0, n - 1
      do j = 0, n - 1
         if (x(j, i) .ne. i + j + 3) call abort
      end do
   end do

end program main
...

But autopar fails to parallelize due to failing dependency analysis.

I then tried to add -floop-parallelize-all, and found that the graphite 
dependency analysis did manage to decide that the iterations are 
independent.

At https://gcc.gnu.org/wiki/Graphite/Parallelization I read:
...
In GCC there already exists an auto-parallelization pass 
(tree-parloops.c), which is base on the lambda framework originally 
developed by Sebastian. Since Lambda framework is limited to some cases 
(e.g. triangle loops, loops with 'if' conditions), Graphite was 
developed to handle the loops that lambda was not able to handle .
...

So I wondered, why not always use the graphite dependency analysis in 
parloops. (Of course you could use -floop-parallelize-all, but that also 
changes the heuristic). So I wrote a patch for parloops to use graphite 
dependency analysis by default (so without -floop-parallelize-all), but 
while testing found out that all the reduction test-cases started 
failing because the modifications graphite makes to the code messes up 
the parloops reduction analysis.

Then I came up with this patch, which:
- first runs a parloops pass, restricted to reduction loops only,
- then runs graphite dependency analysis
- followed by a normal parloops pass run.

This way, we get to both:
- compile the reduction testcases as before, and
- profit from the better graphite dependency analysis otherwise.

A point worth noting is that I stopped running pass_iv_canon before 
parloops (only in case of -ftree-parallelize-loops > 1) because running 
it before graphite makes the graphite scop detection fail.

Bootstrapped and reg-tested on x86_64.

Any comments?

Thanks,
- Tom

[-- Attachment #2: 0001-Use-graphite-for-parloops.patch --]
[-- Type: text/x-patch, Size: 39954 bytes --]

Use graphite for parloops

2015-07-15  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/66873
	* graphite-isl-ast-to-gimple.c (translate_isl_ast_for_loop):
	(scop_to_isl_ast): Handle flag_tree_parallelize_loops.
	* graphite-poly.c (apply_poly_transforms): Same.
	* graphite.c (gate_graphite_transforms): Remove static.
	(pass_graphite_parloops): New pass.
	(make_pass_graphite_parloops): New function.
	(pass_graphite_transforms2): New pass.
	(make_pass_graphite_transforms2): New function.
	* omp-low.c (pass_expand_omp_ssa::clone): Same.
	* passes.def: Add pass groups pass_parallelize_reductions and
	pass_graphite_parloops.
	* tree-parloops.c (gen_parallel_loop): Add debug print for alternative
	exit-first loop transform.
	(parallelize_loops): Add reductions_only parameter.
	(pass_parallelize_loops::execute): Call parallelize_loops with extra
	argument.
	(pass_parallelize_reductions): New pass.
	(pass_parallelize_reductions::execute)
	(make_pass_parallelize_reductions): New function.
	* tree-pass.h (make_pass_graphite_parloops)
	(make_pass_parallelize_reductions, make_pass_graphite_transforms2)
	(gate_graphite_transforms): Declare.
	tree-ssa-loop-ivcanon.c (pass_iv_canon::gate): Return false if
	flag_tree_parallelize_loops > 1.

	* gcc.dg/autopar/outer-6.c: Update for new pass parloopsred.
	* gcc.dg/autopar/reduc-1.c: Same.
	* gcc.dg/autopar/reduc-1char.c: Same.
	* gcc.dg/autopar/reduc-1short.c: Same.
	* gcc.dg/autopar/reduc-2.c: Same.
	* gcc.dg/autopar/reduc-2char.c: Same.
	* gcc.dg/autopar/reduc-2short.c: Same.
	* gcc.dg/autopar/reduc-3.c: Same.
	* gcc.dg/autopar/reduc-6.c: Same.
	* gcc.dg/autopar/reduc-7.c: Same.
	* gcc.dg/autopar/reduc-8.c: Same.
	* gcc.dg/autopar/reduc-9.c: Same.
	* gcc.dg/parloops-exit-first-loop-alt-2.c: Same.
	* gcc.dg/parloops-exit-first-loop-alt-3.c: Same.
	* gcc.dg/parloops-exit-first-loop-alt-4.c: Same.
	* gcc.dg/parloops-exit-first-loop-alt-5.c: Same.
	* gcc.dg/parloops-exit-first-loop-alt-6.c: Same.
	* gcc.dg/parloops-exit-first-loop-alt-7.c: Same.
	* gcc.dg/parloops-exit-first-loop-alt-pr66652.c: Same.
	* gcc.dg/parloops-exit-first-loop-alt.c: Same.
	* gfortran.dg/parloops-exit-first-loop-alt-2.f95: Same.
	* gfortran.dg/parloops-exit-first-loop-alt.f95: Same.
	* gfortran.dg/parloops-outer-1.f95: New test.
---
 gcc/graphite-isl-ast-to-gimple.c                   |  6 +-
 gcc/graphite-poly.c                                |  3 +-
 gcc/graphite.c                                     | 83 ++++++++++++++++++-
 gcc/omp-low.c                                      |  1 +
 gcc/passes.def                                     | 11 +++
 gcc/testsuite/gcc.dg/autopar/outer-6.c             |  6 +-
 gcc/testsuite/gcc.dg/autopar/reduc-1.c             |  7 +-
 gcc/testsuite/gcc.dg/autopar/reduc-1char.c         |  7 +-
 gcc/testsuite/gcc.dg/autopar/reduc-1short.c        |  7 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2.c             |  7 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c         |  7 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c        |  7 +-
 gcc/testsuite/gcc.dg/autopar/reduc-3.c             |  5 +-
 gcc/testsuite/gcc.dg/autopar/reduc-6.c             |  6 +-
 gcc/testsuite/gcc.dg/autopar/reduc-7.c             |  7 +-
 gcc/testsuite/gcc.dg/autopar/reduc-8.c             |  7 +-
 gcc/testsuite/gcc.dg/autopar/reduc-9.c             |  7 +-
 .../gcc.dg/parloops-exit-first-loop-alt-2.c        |  9 +--
 .../gcc.dg/parloops-exit-first-loop-alt-3.c        |  9 +--
 .../gcc.dg/parloops-exit-first-loop-alt-4.c        |  9 +--
 .../gcc.dg/parloops-exit-first-loop-alt-5.c        |  9 +--
 .../gcc.dg/parloops-exit-first-loop-alt-6.c        |  9 +--
 .../gcc.dg/parloops-exit-first-loop-alt-7.c        |  9 +--
 .../gcc.dg/parloops-exit-first-loop-alt-pr66652.c  | 11 +--
 .../gcc.dg/parloops-exit-first-loop-alt.c          | 10 +--
 .../gfortran.dg/parloops-exit-first-loop-alt-2.f95 |  9 +--
 .../gfortran.dg/parloops-exit-first-loop-alt.f95   | 10 +--
 gcc/testsuite/gfortran.dg/parloops-outer-1.f95     | 37 +++++++++
 gcc/tree-parloops.c                                | 93 ++++++++++++++++++++--
 gcc/tree-pass.h                                    |  5 ++
 gcc/tree-ssa-loop-ivcanon.c                        |  6 +-
 31 files changed, 303 insertions(+), 116 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/parloops-outer-1.f95

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index b32781a5..bdafd40 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -442,7 +442,8 @@ translate_isl_ast_for_loop (loop_p context_loop,
   redirect_edge_succ_nodup (next_e, after);
   set_immediate_dominator (CDI_DOMINATORS, next_e->dest, next_e->src);
 
-  if (flag_loop_parallelize_all)
+  if (flag_loop_parallelize_all
+      || flag_tree_parallelize_loops > 1)
   {
     isl_id *id = isl_ast_node_get_annotation (node_for);
     gcc_assert (id);
@@ -995,7 +996,8 @@ scop_to_isl_ast (scop_p scop, ivs_params &ip)
   context_isl = set_options (context_isl, schedule_isl, options_luj);
 
   isl_union_map *dependences = NULL;
-  if (flag_loop_parallelize_all)
+  if (flag_loop_parallelize_all
+      || flag_tree_parallelize_loops > 1)
   {
     dependences = scop_get_dependences (scop);
     context_isl =
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index bcd08d8..e32325e 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -241,7 +241,8 @@ apply_poly_transforms (scop_p scop)
   if (flag_graphite_identity)
     transform_done = true;
 
-  if (flag_loop_parallelize_all)
+  if (flag_loop_parallelize_all
+      || flag_tree_parallelize_loops > 1)
     transform_done = true;
 
   if (flag_loop_block)
diff --git a/gcc/graphite.c b/gcc/graphite.c
index a81ef6a..6ba58c0 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -319,7 +319,7 @@ graphite_transforms (struct function *fun)
   return 0;
 }
 
-static bool
+bool
 gate_graphite_transforms (void)
 {
   /* Enable -fgraphite pass if any one of the graphite optimization flags
@@ -373,6 +373,45 @@ make_pass_graphite (gcc::context *ctxt)
 
 namespace {
 
+const pass_data pass_data_graphite_parloops =
+{
+  GIMPLE_PASS, /* type */
+  "graphite_parloops", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_GRAPHITE, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_graphite_parloops : public gimple_opt_pass
+{
+public:
+  pass_graphite_parloops (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_graphite_parloops, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_tree_parallelize_loops > 1
+	    && !gate_graphite_transforms ());
+  }
+
+}; // class pass_graphite_parloops
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_graphite_parloops (gcc::context *ctxt)
+{
+  return new pass_graphite_parloops (ctxt);
+}
+
+namespace {
+
 const pass_data pass_data_graphite_transforms =
 {
   GIMPLE_PASS, /* type */
@@ -407,4 +446,46 @@ make_pass_graphite_transforms (gcc::context *ctxt)
   return new pass_graphite_transforms (ctxt);
 }
 
+/* It would be preferable to use a clone of pass_data_graphite_transforms rather
+   than declare a new pass.  But when using a clone of
+   pass_data_graphite_transforms (and changing the gate to trigger for
+   flag_tree_parallelize_loops > 1 as well) in pass group
+   pass_graphite_parloops, the pass is not executed.  */
+
+namespace {
+
+const pass_data pass_data_graphite_transforms2 =
+{
+  GIMPLE_PASS, /* type */
+  "graphite2", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_GRAPHITE_TRANSFORMS, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_graphite_transforms2 : public gimple_opt_pass
+{
+public:
+  pass_graphite_transforms2 (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_graphite_transforms2, ctxt)
+  {}
 
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_tree_parallelize_loops > 1);
+  }
+  virtual unsigned int execute (function *fun) { return graphite_transforms (fun); }
+}; // class pass_graphite_transforms2
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_graphite_transforms2 (gcc::context *ctxt)
+{
+  return new pass_graphite_transforms2 (ctxt);
+}
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..8cbee3a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -9576,6 +9576,7 @@ public:
       return !(fun->curr_properties & PROP_gimple_eomp);
     }
   virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  opt_pass *clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
diff --git a/gcc/passes.def b/gcc/passes.def
index 5cd07ae..aa1d1a1 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -244,6 +244,17 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_dce);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_iv_canon);
+	  NEXT_PASS (pass_parallelize_reductions);
+	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_reductions)
+	      NEXT_PASS (pass_expand_omp_ssa);
+	  POP_INSERT_PASSES ()
+	  NEXT_PASS (pass_graphite_parloops);
+	  PUSH_INSERT_PASSES_WITHIN (pass_graphite_parloops)
+	      NEXT_PASS (pass_graphite_transforms2);
+	      NEXT_PASS (pass_lim);
+	      NEXT_PASS (pass_copy_prop);
+	      NEXT_PASS (pass_dce);
+	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_parallelize_loops);
 	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
 	      NEXT_PASS (pass_expand_omp_ssa);
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-6.c b/gcc/testsuite/gcc.dg/autopar/outer-6.c
index 6bef7cc..0f01bd5 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-optimized" } */
 
 void abort (void);
 
@@ -44,6 +44,6 @@ int main(void)
 
 
 /* Check that outer loop is parallelized.  */
-/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloopsred" } } */
 /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-1.c b/gcc/testsuite/gcc.dg/autopar/reduc-1.c
index 6e9a280..4fc9b31 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-1.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -66,6 +66,7 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-1char.c b/gcc/testsuite/gcc.dg/autopar/reduc-1char.c
index 48ead88..497b7e0 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-1char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-1char.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -60,6 +60,7 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-1short.c b/gcc/testsuite/gcc.dg/autopar/reduc-1short.c
index f3f547c..6af8e4b 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-1short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-1short.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -59,6 +59,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2.c b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
index 3ad16e4..2d0b2a1 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -63,6 +63,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 072489f..49ef16d 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -60,7 +60,8 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 4dbbc8a..3ec1c2a 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -59,6 +59,7 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-3.c b/gcc/testsuite/gcc.dg/autopar/reduc-3.c
index 0d4baef..e7ca82b 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-3.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -50,6 +50,7 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 1 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 1 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloopsred" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-6.c b/gcc/testsuite/gcc.dg/autopar/reduc-6.c
index 91f679e..6c5ec7b 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -56,6 +56,6 @@ int main (void)
 
 
 /* need -ffast-math to  parallelize these loops.  */
-/* { dg-final { scan-tree-dump-times "Detected reduction" 0 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 0 "parloopsred" } } */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "FAILED: it is not a part of reduction" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "FAILED: it is not a part of reduction" 3 "parloopsred" } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-7.c b/gcc/testsuite/gcc.dg/autopar/reduc-7.c
index 77b99e1..dccf2a5 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-7.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdlib.h>
 
@@ -84,6 +84,7 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 16fb954..466bcc5 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdlib.h>
 
@@ -84,5 +84,6 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-9.c b/gcc/testsuite/gcc.dg/autopar/reduc-9.c
index 90f4db2..11556d7 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-9.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
 
 #include <stdlib.h>
 
@@ -84,5 +84,6 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
index 24e605a..f1cf75f 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target pthread } */
-/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
 
 /* Constant bound, vector addition.  */
 
@@ -19,9 +19,4 @@ f (void)
       c[i] = a[i] + b[i];
 }
 
-/* Three times three array accesses:
-   - three in f._loopfn.0
-   - three in the parallel
-   - three in the low iteration count loop
-   Crucially, none for a peeled off last iteration following the parallel.  */
-/* { dg-final { scan-tree-dump-times "(?n)\\\[i" 9 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
index fec53a1..6c34084 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target pthread } */
-/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloopsred-details" } */
 
 /* Variable bound, reduction.  */
 
@@ -18,9 +18,4 @@ f (unsigned int n, unsigned int *__restrict__ a)
   return sum;
 }
 
-/* Three array accesses:
-   - one in f._loopfn.0
-   - one in the parallel
-   - one in the low iteration count loop
-   Crucially, none for a peeled off last iteration following the parallel.  */
-/* { dg-final { scan-tree-dump-times "(?n)\\\* 4" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloopsred" } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-4.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-4.c
index 2b8d289..f051ed4 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-4.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target pthread } */
-/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloopsred-details" } */
 
 /* Constant bound, reduction.  */
 
@@ -20,9 +20,4 @@ f (void)
   return sum;
 }
 
-/* Three array accesses:
-   - one in f._loopfn.0
-   - one in the parallel
-   - one in the low iteration count loop
-   Crucially, none for a peeled off last iteration following the parallel.  */
-/* { dg-final { scan-tree-dump-times "(?n)\\\* 4" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloopsred" } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-5.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-5.c
index 3f799cf..3c1e99b 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-5.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-5.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target pthread } */
-/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
 
 /* Variable bound, vector addition, unsigned loop counter, unsigned bound.  */
 
@@ -14,9 +14,4 @@ f (unsigned int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
     c[i] = a[i] + b[i];
 }
 
-/* Three times a store:
-   - one in f._loopfn.0
-   - one in the parallel
-   - one in the low iteration count loop
-   Crucially, none for a peeled off last iteration following the parallel.  */
-/* { dg-final { scan-tree-dump-times "(?n)^  \\*_\[0-9\]*" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-6.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-6.c
index ee19a55..edc60ba 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-6.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-6.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target pthread } */
-/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
 
 /* Variable bound, vector addition, unsigned loop counter, signed bound.  */
 
@@ -14,9 +14,4 @@ f (int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
     c[i] = a[i] + b[i];
 }
 
-/* Three times a store:
-   - one in f._loopfn.0
-   - one in the parallel
-   - one in the low iteration count loop
-   Crucially, none for a peeled off last iteration following the parallel.  */
-/* { dg-final { scan-tree-dump-times "(?n)^  \\*_\[0-9\]*" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-7.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-7.c
index c337342..38be2e8 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-7.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-7.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target pthread } */
-/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
 
 /* Variable bound, vector addition, signed loop counter, signed bound.  */
 
@@ -14,9 +14,4 @@ f (int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
     c[i] = a[i] + b[i];
 }
 
-/* Three times a store:
-   - one in f._loopfn.0
-   - one in the parallel
-   - one in the low iteration count loop
-   Crucially, none for a peeled off last iteration following the parallel.  */
-/* { dg-final { scan-tree-dump-times "(?n)^  \\*_\[0-9\]*" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c
index 2ea097d..7b64368 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target pthread } */
-/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloopsred-details" } */
 
 #include <stdio.h>
 #include <stdlib.h>
@@ -22,10 +22,5 @@ f (unsigned int n, unsigned int sum)
   return sum;
 }
 
-/* Four times % 13:
-   - once in f._loopfn.0
-   - once in the parallel
-   - once in the low iteration count loop
-   - once for a peeled off last iteration following the parallel.
-   In other words, we want try_transform_to_exit_first_loop_alt to fail.  */
-/* { dg-final { scan-tree-dump-times "(?n)% 13" 4 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 1 "parloopsred" } } */
+/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 0 "parloopsred" } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
index 0b69165..44596e3 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target pthread } */
-/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
 
 /* Variable bound, vector addition, signed loop counter, unsigned bound.  */
 
@@ -14,9 +14,5 @@ f (unsigned int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
     c[i] = a[i] + b[i];
 }
 
-/* Three times a store:
-   - one in f._loopfn.0
-   - one in the parallel
-   - one in the low iteration count loop
-   Crucially, none for a peeled off last iteration following the parallel.  */
-/* { dg-final { scan-tree-dump-times "(?n)^  \\*_\[0-9\]*" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
+
diff --git a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95 b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
index f26a6e3..52434f2 100644
--- a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
+++ b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
@@ -1,7 +1,7 @@
 ! { dg-additional-options "-O2" }
 ! { dg-require-effective-target pthread }
 ! { dg-additional-options "-ftree-parallelize-loops=2" }
-! { dg-additional-options "-fdump-tree-parloops" }
+! { dg-additional-options "-fdump-tree-parloops-details" }
 
 ! Constant bound, vector addition.
 
@@ -16,9 +16,4 @@ subroutine foo ()
   end do
 end subroutine foo
 
-! Three times plus 25:
-! - once in f._loopfn.0
-! - once in the parallel
-! - once in the low iteration count loop
-! Crucially, none for a peeled off last iteration following the parallel.
-! { dg-final { scan-tree-dump-times "(?n) \\+ 25;" 3 "parloops" } }
+! { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } }
diff --git a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95 b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
index 6dc8a38..1eb9dfd 100644
--- a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
+++ b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
@@ -1,7 +1,7 @@
 ! { dg-additional-options "-O2" }
 ! { dg-require-effective-target pthread }
 ! { dg-additional-options "-ftree-parallelize-loops=2" }
-! { dg-additional-options "-fdump-tree-parloops" }
+! { dg-additional-options "-fdump-tree-parloops-details" }
 
 ! Variable bound, vector addition.
 
@@ -17,9 +17,5 @@ subroutine foo (nr)
   end do
 end subroutine foo
 
-! Three times plus 25:
-! - once in f._loopfn.0
-! - once in the parallel
-! - once in the low iteration count loop
-! Crucially, none for a peeled off last iteration following the parallel.
-! { dg-final { scan-tree-dump-times "(?n) \\+ 25;" 3 "parloops" } }
+! { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } }
+
diff --git a/gcc/testsuite/gfortran.dg/parloops-outer-1.f95 b/gcc/testsuite/gfortran.dg/parloops-outer-1.f95
new file mode 100644
index 0000000..144e4e8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/parloops-outer-1.f95
@@ -0,0 +1,37 @@
+! { dg-do compile }
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=2" }
+! { dg-additional-options "-fdump-tree-parloops-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+! Based on autopar/outer-1.c.
+
+program main
+  implicit none
+  integer, parameter         :: n = 500
+  integer, dimension (0:n-1, 0:n-1) :: x
+  integer                    :: i, j, ii, jj
+
+
+  do ii = 0, n - 1
+     do jj = 0, n - 1
+        x(jj, ii) = ii + jj + 3
+     end do
+  end do
+
+  do i = 0, n - 1
+     do j = 0, n - 1
+        if (x(j, i) .ne. i + j + 3) call abort
+     end do
+  end do
+
+end program main
+
+! Check that only one loop is analyzed, and that it can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops" } }
+! { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function main._loopfn.0 " 1 "optimized" } }
+
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 036677b..4bfe588 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2238,7 +2238,15 @@ gen_parallel_loop (struct loop *loop,
      increment) and immediately follows the loop exit test.  Attempt to move the
      entry of the loop directly before the exit check and increase the number of
      iterations of the loop by one.  */
-  if (!try_transform_to_exit_first_loop_alt (loop, reduction_list, nit))
+  if (try_transform_to_exit_first_loop_alt (loop, reduction_list, nit))
+    {
+      if (dump_file
+	  && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "alternative exit-first loop transform succeeded"
+		 " for loop %d\n", loop->num);
+    }
+  else
     {
       /* Fall back on the method that handles more cases, but duplicates the
 	 loop body: move the exit condition of LOOP to the beginning of its
@@ -2508,7 +2516,7 @@ try_create_reduction_list (loop_p loop,
    otherwise.  */
 
 static bool
-parallelize_loops (void)
+parallelize_loops (bool reductions_only)
 {
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
@@ -2584,10 +2592,31 @@ parallelize_loops (void)
       if (!try_create_reduction_list (loop, &reduction_list))
 	continue;
 
-      if (!flag_loop_parallelize_all
-	  && !loop_parallel_p (loop, &parloop_obstack))
+      if (reductions_only
+	  && reduction_list.elements () == 0)
 	continue;
 
+      if (!flag_loop_parallelize_all)
+	{
+	  bool independent = false;
+
+	  if (!independent
+	      && loop->can_be_parallel)
+	    {
+	      if (dump_file
+		  && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file,
+			 "  SUCCESS: may be parallelized, graphite analysis\n");
+	      independent = true;
+	    }
+
+	  if (!independent)
+	    independent = loop_parallel_p (loop, &parloop_obstack);
+
+	  if (!independent)
+	    continue;
+	}
+
       changed = true;
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
@@ -2652,7 +2681,7 @@ pass_parallelize_loops::execute (function *fun)
   if (number_of_loops (fun) <= 1)
     return 0;
 
-  if (parallelize_loops ())
+  if (parallelize_loops (false))
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
       return TODO_update_ssa;
@@ -2668,3 +2697,57 @@ make_pass_parallelize_loops (gcc::context *ctxt)
 {
   return new pass_parallelize_loops (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_parallelize_reductions =
+{
+  GIMPLE_PASS, /* type */
+  "parloopsred", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_PARALLELIZE_LOOPS, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_parallelize_reductions : public gimple_opt_pass
+{
+public:
+  pass_parallelize_reductions (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_parallelize_reductions, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_tree_parallelize_loops > 1
+	    && !gate_graphite_transforms ());
+  }
+  virtual unsigned int execute (function *);
+}; // class pass_parallelize_reductions
+
+unsigned
+pass_parallelize_reductions::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  if (parallelize_loops (true))
+    {
+      fun->curr_properties &= ~(PROP_gimple_eomp);
+      return TODO_update_ssa;
+    }
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_parallelize_reductions (gcc::context *ctxt)
+{
+  return new pass_parallelize_reductions (ctxt);
+}
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index c47b22e..f0a7017 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -368,7 +368,9 @@ extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_empty_loop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_record_bounds (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_graphite (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_graphite_parloops (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_graphite_transforms (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_graphite_transforms2 (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_if_conversion (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_distribution (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_vectorize (gcc::context *ctxt);
@@ -377,6 +379,7 @@ extern gimple_opt_pass *make_pass_slp_vectorize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_complete_unroll (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_complete_unrolli (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_parallelize_loops (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_parallelize_reductions (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
@@ -595,6 +598,8 @@ extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
 
+extern bool gate_graphite_transforms (void);
+
 /* Current optimization pass.  */
 extern opt_pass *current_pass;
 
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index eca70a9..43724ed 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1421,7 +1421,11 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *) { return flag_tree_loop_ivcanon != 0; }
+  virtual bool gate (function *)
+  {
+    return (flag_tree_loop_ivcanon != 0
+	    && flag_tree_parallelize_loops <= 1);
+  }
   virtual unsigned int execute (function *fun);
 
 }; // class pass_iv_canon
-- 
1.9.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-15 22:18 [RFC, PR66873] Use graphite for parloops Tom de Vries
@ 2015-07-16  8:48 ` Richard Biener
  2015-07-16 10:25   ` Thomas Schwinge
  2015-07-20 18:54 ` Sebastian Pop
  1 sibling, 1 reply; 27+ messages in thread
From: Richard Biener @ 2015-07-16  8:48 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Wed, Jul 15, 2015 at 10:26 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> Hi,
>
> I tried to parallelize this fortran test-case (based on autopar/outer-1.c),
> specifically the outer loop of the first loop nest using
> -ftree-parallelize-loops=2:
> ...
> program main
>   implicit none
>   integer, parameter         :: n = 500
>   integer, dimension (0:n-1, 0:n-1) :: x
>   integer                    :: i, j, ii, jj
>
>
>   do ii = 0, n - 1
>      do jj = 0, n - 1
>         x(jj, ii) = ii + jj + 3
>      end do
>   end do
>
>   do i = 0, n - 1
>      do j = 0, n - 1
>         if (x(j, i) .ne. i + j + 3) call abort
>      end do
>   end do
>
> end program main
> ...
>
> But autopar fails to parallelize due to failing dependency analysis.
>
> I then tried to add -floop-parallelize-all, and found that the graphite
> dependency analysis did manage to decide that the iterations are
> independent.
>
> At https://gcc.gnu.org/wiki/Graphite/Parallelization I read:
> ...
> In GCC there already exists an auto-parallelization pass (tree-parloops.c),
> which is base on the lambda framework originally developed by Sebastian.
> Since Lambda framework is limited to some cases (e.g. triangle loops, loops
> with 'if' conditions), Graphite was developed to handle the loops that
> lambda was not able to handle .
> ...
>
> So I wondered, why not always use the graphite dependency analysis in
> parloops. (Of course you could use -floop-parallelize-all, but that also
> changes the heuristic). So I wrote a patch for parloops to use graphite
> dependency analysis by default (so without -floop-parallelize-all), but
> while testing found out that all the reduction test-cases started failing
> because the modifications graphite makes to the code messes up the parloops
> reduction analysis.
>
> Then I came up with this patch, which:
> - first runs a parloops pass, restricted to reduction loops only,
> - then runs graphite dependency analysis
> - followed by a normal parloops pass run.
>
> This way, we get to both:
> - compile the reduction testcases as before, and
> - profit from the better graphite dependency analysis otherwise.
>
> A point worth noting is that I stopped running pass_iv_canon before parloops
> (only in case of -ftree-parallelize-loops > 1) because running it before
> graphite makes the graphite scop detection fail.
>
> Bootstrapped and reg-tested on x86_64.
>
> Any comments?

graphite dependence analysis is too slow to be enabled unconditionally.
(read: hours in some simple cases - see bugzilla)

Richard.

> Thanks,
> - Tom

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-16  8:48 ` Richard Biener
@ 2015-07-16 10:25   ` Thomas Schwinge
  2015-07-16 10:28     ` Richard Biener
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Schwinge @ 2015-07-16 10:25 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]

Hi Tom!

On Thu, 16 Jul 2015 10:46:00 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
> On Wed, Jul 15, 2015 at 10:26 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> > I tried to parallelize this fortran test-case (based on autopar/outer-1.c),
> > [...]

> > So I wondered, why not always use the graphite dependency analysis in
> > parloops. (Of course you could use -floop-parallelize-all, but that also
> > changes the heuristic). So I wrote a patch for parloops to use graphite
> > dependency analysis by default (so without -floop-parallelize-all), but
> > while testing found out that all the reduction test-cases started failing
> > because the modifications graphite makes to the code messes up the parloops
> > reduction analysis.
> >
> > Then I came up with this patch, which:
> > - first runs a parloops pass, restricted to reduction loops only,
> > - then runs graphite dependency analysis
> > - followed by a normal parloops pass run.
> >
> > This way, we get to both:
> > - compile the reduction testcases as before, and
> > - profit from the better graphite dependency analysis otherwise.

> graphite dependence analysis is too slow to be enabled unconditionally.
> (read: hours in some simple cases - see bugzilla)

Haha, "cool"!  ;-)

Maybe it is still reasonable to use graphite to analyze the code inside
OpenACC kernels regions -- maybe such code can reasonably be expected to
not have the properties that make its analysis lengthy?  So, Tom, could
you please identify and check such PRs, to get an understanding of what
these properties are?


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-16 10:25   ` Thomas Schwinge
@ 2015-07-16 10:28     ` Richard Biener
  2015-07-16 10:41       ` Richard Biener
  2015-07-16 11:41       ` Tom de Vries
  0 siblings, 2 replies; 27+ messages in thread
From: Richard Biener @ 2015-07-16 10:28 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Tom de Vries, gcc-patches

On Thu, Jul 16, 2015 at 12:19 PM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi Tom!
>
> On Thu, 16 Jul 2015 10:46:00 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Wed, Jul 15, 2015 at 10:26 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> > I tried to parallelize this fortran test-case (based on autopar/outer-1.c),
>> > [...]
>
>> > So I wondered, why not always use the graphite dependency analysis in
>> > parloops. (Of course you could use -floop-parallelize-all, but that also
>> > changes the heuristic). So I wrote a patch for parloops to use graphite
>> > dependency analysis by default (so without -floop-parallelize-all), but
>> > while testing found out that all the reduction test-cases started failing
>> > because the modifications graphite makes to the code messes up the parloops
>> > reduction analysis.
>> >
>> > Then I came up with this patch, which:
>> > - first runs a parloops pass, restricted to reduction loops only,
>> > - then runs graphite dependency analysis
>> > - followed by a normal parloops pass run.
>> >
>> > This way, we get to both:
>> > - compile the reduction testcases as before, and
>> > - profit from the better graphite dependency analysis otherwise.
>
>> graphite dependence analysis is too slow to be enabled unconditionally.
>> (read: hours in some simple cases - see bugzilla)
>
> Haha, "cool"!  ;-)
>
> Maybe it is still reasonable to use graphite to analyze the code inside
> OpenACC kernels regions -- maybe such code can reasonably be expected to
> not have the properties that make its analysis lengthy?  So, Tom, could
> you please identify and check such PRs, to get an understanding of what
> these properties are?

Like the one in PR62113 or 53852 or 59121.

>
> Grüße,
>  Thomas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-16 10:28     ` Richard Biener
@ 2015-07-16 10:41       ` Richard Biener
  2015-07-26 22:54         ` Tom de Vries
  2015-07-16 11:41       ` Tom de Vries
  1 sibling, 1 reply; 27+ messages in thread
From: Richard Biener @ 2015-07-16 10:41 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Tom de Vries, gcc-patches

On Thu, Jul 16, 2015 at 12:23 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Jul 16, 2015 at 12:19 PM, Thomas Schwinge
> <thomas@codesourcery.com> wrote:
>> Hi Tom!
>>
>> On Thu, 16 Jul 2015 10:46:00 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
>>> On Wed, Jul 15, 2015 at 10:26 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> > I tried to parallelize this fortran test-case (based on autopar/outer-1.c),
>>> > [...]
>>
>>> > So I wondered, why not always use the graphite dependency analysis in
>>> > parloops. (Of course you could use -floop-parallelize-all, but that also
>>> > changes the heuristic). So I wrote a patch for parloops to use graphite
>>> > dependency analysis by default (so without -floop-parallelize-all), but
>>> > while testing found out that all the reduction test-cases started failing
>>> > because the modifications graphite makes to the code messes up the parloops
>>> > reduction analysis.
>>> >
>>> > Then I came up with this patch, which:
>>> > - first runs a parloops pass, restricted to reduction loops only,
>>> > - then runs graphite dependency analysis
>>> > - followed by a normal parloops pass run.
>>> >
>>> > This way, we get to both:
>>> > - compile the reduction testcases as before, and
>>> > - profit from the better graphite dependency analysis otherwise.
>>
>>> graphite dependence analysis is too slow to be enabled unconditionally.
>>> (read: hours in some simple cases - see bugzilla)
>>
>> Haha, "cool"!  ;-)
>>
>> Maybe it is still reasonable to use graphite to analyze the code inside
>> OpenACC kernels regions -- maybe such code can reasonably be expected to
>> not have the properties that make its analysis lengthy?  So, Tom, could
>> you please identify and check such PRs, to get an understanding of what
>> these properties are?
>
> Like the one in PR62113 or 53852 or 59121.

Btw, it would be nice to handle this case (or at least figure out why we can't)
in GCCs dependence analysis.

Richard.

>>
>> Grüße,
>>  Thomas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-16 10:28     ` Richard Biener
  2015-07-16 10:41       ` Richard Biener
@ 2015-07-16 11:41       ` Tom de Vries
  2015-07-20 18:53         ` Sebastian Pop
  1 sibling, 1 reply; 27+ messages in thread
From: Tom de Vries @ 2015-07-16 11:41 UTC (permalink / raw)
  To: Richard Biener, Thomas Schwinge; +Cc: gcc-patches

On 16/07/15 12:23, Richard Biener wrote:
> On Thu, Jul 16, 2015 at 12:19 PM, Thomas Schwinge
> <thomas@codesourcery.com> wrote:
>> Hi Tom!
>>
>> On Thu, 16 Jul 2015 10:46:00 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
>>> On Wed, Jul 15, 2015 at 10:26 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>> I tried to parallelize this fortran test-case (based on autopar/outer-1.c),
>>>> [...]
>>
>>>> So I wondered, why not always use the graphite dependency analysis in
>>>> parloops. (Of course you could use -floop-parallelize-all, but that also
>>>> changes the heuristic). So I wrote a patch for parloops to use graphite
>>>> dependency analysis by default (so without -floop-parallelize-all), but
>>>> while testing found out that all the reduction test-cases started failing
>>>> because the modifications graphite makes to the code messes up the parloops
>>>> reduction analysis.
>>>>
>>>> Then I came up with this patch, which:
>>>> - first runs a parloops pass, restricted to reduction loops only,
>>>> - then runs graphite dependency analysis
>>>> - followed by a normal parloops pass run.
>>>>
>>>> This way, we get to both:
>>>> - compile the reduction testcases as before, and
>>>> - profit from the better graphite dependency analysis otherwise.
>>
>>> graphite dependence analysis is too slow to be enabled unconditionally.
>>> (read: hours in some simple cases - see bugzilla)
>>
>> Haha, "cool"!  ;-)
>>
>> Maybe it is still reasonable to use graphite to analyze the code inside
>> OpenACC kernels regions -- maybe such code can reasonably be expected to
>> not have the properties that make its analysis lengthy?  So, Tom, could
>> you please identify and check such PRs, to get an understanding of what
>> these properties are?
>
> Like the one in PR62113 or 53852 or 59121.

PR62113 and PR59121 do not reproduce for me on trunk.

PR53852 does reproduce for me (to the point that I had to reset my laptop).

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-16 11:41       ` Tom de Vries
@ 2015-07-20 18:53         ` Sebastian Pop
  2015-07-21  0:22           ` Tom de Vries
  0 siblings, 1 reply; 27+ messages in thread
From: Sebastian Pop @ 2015-07-20 18:53 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Richard Biener, Thomas Schwinge, gcc-patches

Tom de Vries wrote:
> >>>graphite dependence analysis is too slow to be enabled unconditionally.
> >>>(read: hours in some simple cases - see bugzilla)
> >>
> >>Haha, "cool"!  ;-)
> >>
> >>Maybe it is still reasonable to use graphite to analyze the code inside
> >>OpenACC kernels regions -- maybe such code can reasonably be expected to
> >>not have the properties that make its analysis lengthy?  So, Tom, could
> >>you please identify and check such PRs, to get an understanding of what
> >>these properties are?
> >
> >Like the one in PR62113 or 53852 or 59121.
> 
> PR62113 and PR59121 do not reproduce for me on trunk.
> 
> PR53852 does reproduce for me (to the point that I had to reset my laptop).

ISL has a way to count the number of operations, based on a watermark it will
output an error code that we can use to leave graphite: see documentation of
isl_ctx_set_max_operations().  With that mechanism we can set a goal for
graphite of at max (say 10% overhead) of whole compilation time.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-15 22:18 [RFC, PR66873] Use graphite for parloops Tom de Vries
  2015-07-16  8:48 ` Richard Biener
@ 2015-07-20 18:54 ` Sebastian Pop
  2015-07-21  5:59   ` Tom de Vries
  1 sibling, 1 reply; 27+ messages in thread
From: Sebastian Pop @ 2015-07-20 18:54 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Tom de Vries wrote:
> So I wondered, why not always use the graphite dependency analysis
> in parloops. (Of course you could use -floop-parallelize-all, but
> that also changes the heuristic). So I wrote a patch for parloops to
> use graphite dependency analysis by default (so without
> -floop-parallelize-all), but while testing found out that all the
> reduction test-cases started failing because the modifications
> graphite makes to the code messes up the parloops reduction
> analysis.
> 
> Then I came up with this patch, which:
> - first runs a parloops pass, restricted to reduction loops only,

I would prefer to fix graphite to catch the reduction loop and avoid running an
extra pass before graphite for that case.  Can you please specify which file is
failing to be parallelized?  Are they all those testcases that you update the flags?

Also it seems to me that you are missing -ffast-math to parallelize all these
loops: without that flag graphite would not mark reductions as
associative/commutative operations and they would not be recognized as parallel.
Is that something the current parloops detection is not too strict about?

Thanks,
Sebastian

> - then runs graphite dependency analysis
> - followed by a normal parloops pass run.
> 
> This way, we get to both:
> - compile the reduction testcases as before, and
> - profit from the better graphite dependency analysis otherwise.
> 
> A point worth noting is that I stopped running pass_iv_canon before
> parloops (only in case of -ftree-parallelize-loops > 1) because
> running it before graphite makes the graphite scop detection fail.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Any comments?
> 
> Thanks,
> - Tom

> Use graphite for parloops
> 
> 2015-07-15  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR tree-optimization/66873
> 	* graphite-isl-ast-to-gimple.c (translate_isl_ast_for_loop):
> 	(scop_to_isl_ast): Handle flag_tree_parallelize_loops.
> 	* graphite-poly.c (apply_poly_transforms): Same.
> 	* graphite.c (gate_graphite_transforms): Remove static.
> 	(pass_graphite_parloops): New pass.
> 	(make_pass_graphite_parloops): New function.
> 	(pass_graphite_transforms2): New pass.
> 	(make_pass_graphite_transforms2): New function.
> 	* omp-low.c (pass_expand_omp_ssa::clone): Same.
> 	* passes.def: Add pass groups pass_parallelize_reductions and
> 	pass_graphite_parloops.
> 	* tree-parloops.c (gen_parallel_loop): Add debug print for alternative
> 	exit-first loop transform.
> 	(parallelize_loops): Add reductions_only parameter.
> 	(pass_parallelize_loops::execute): Call parallelize_loops with extra
> 	argument.
> 	(pass_parallelize_reductions): New pass.
> 	(pass_parallelize_reductions::execute)
> 	(make_pass_parallelize_reductions): New function.
> 	* tree-pass.h (make_pass_graphite_parloops)
> 	(make_pass_parallelize_reductions, make_pass_graphite_transforms2)
> 	(gate_graphite_transforms): Declare.
> 	tree-ssa-loop-ivcanon.c (pass_iv_canon::gate): Return false if
> 	flag_tree_parallelize_loops > 1.
> 
> 	* gcc.dg/autopar/outer-6.c: Update for new pass parloopsred.
> 	* gcc.dg/autopar/reduc-1.c: Same.
> 	* gcc.dg/autopar/reduc-1char.c: Same.
> 	* gcc.dg/autopar/reduc-1short.c: Same.
> 	* gcc.dg/autopar/reduc-2.c: Same.
> 	* gcc.dg/autopar/reduc-2char.c: Same.
> 	* gcc.dg/autopar/reduc-2short.c: Same.
> 	* gcc.dg/autopar/reduc-3.c: Same.
> 	* gcc.dg/autopar/reduc-6.c: Same.
> 	* gcc.dg/autopar/reduc-7.c: Same.
> 	* gcc.dg/autopar/reduc-8.c: Same.
> 	* gcc.dg/autopar/reduc-9.c: Same.
> 	* gcc.dg/parloops-exit-first-loop-alt-2.c: Same.
> 	* gcc.dg/parloops-exit-first-loop-alt-3.c: Same.
> 	* gcc.dg/parloops-exit-first-loop-alt-4.c: Same.
> 	* gcc.dg/parloops-exit-first-loop-alt-5.c: Same.
> 	* gcc.dg/parloops-exit-first-loop-alt-6.c: Same.
> 	* gcc.dg/parloops-exit-first-loop-alt-7.c: Same.
> 	* gcc.dg/parloops-exit-first-loop-alt-pr66652.c: Same.
> 	* gcc.dg/parloops-exit-first-loop-alt.c: Same.
> 	* gfortran.dg/parloops-exit-first-loop-alt-2.f95: Same.
> 	* gfortran.dg/parloops-exit-first-loop-alt.f95: Same.
> 	* gfortran.dg/parloops-outer-1.f95: New test.
> ---
>  gcc/graphite-isl-ast-to-gimple.c                   |  6 +-
>  gcc/graphite-poly.c                                |  3 +-
>  gcc/graphite.c                                     | 83 ++++++++++++++++++-
>  gcc/omp-low.c                                      |  1 +
>  gcc/passes.def                                     | 11 +++
>  gcc/testsuite/gcc.dg/autopar/outer-6.c             |  6 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-1.c             |  7 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-1char.c         |  7 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-1short.c        |  7 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-2.c             |  7 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-2char.c         |  7 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-2short.c        |  7 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-3.c             |  5 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-6.c             |  6 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-7.c             |  7 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-8.c             |  7 +-
>  gcc/testsuite/gcc.dg/autopar/reduc-9.c             |  7 +-
>  .../gcc.dg/parloops-exit-first-loop-alt-2.c        |  9 +--
>  .../gcc.dg/parloops-exit-first-loop-alt-3.c        |  9 +--
>  .../gcc.dg/parloops-exit-first-loop-alt-4.c        |  9 +--
>  .../gcc.dg/parloops-exit-first-loop-alt-5.c        |  9 +--
>  .../gcc.dg/parloops-exit-first-loop-alt-6.c        |  9 +--
>  .../gcc.dg/parloops-exit-first-loop-alt-7.c        |  9 +--
>  .../gcc.dg/parloops-exit-first-loop-alt-pr66652.c  | 11 +--
>  .../gcc.dg/parloops-exit-first-loop-alt.c          | 10 +--
>  .../gfortran.dg/parloops-exit-first-loop-alt-2.f95 |  9 +--
>  .../gfortran.dg/parloops-exit-first-loop-alt.f95   | 10 +--
>  gcc/testsuite/gfortran.dg/parloops-outer-1.f95     | 37 +++++++++
>  gcc/tree-parloops.c                                | 93 ++++++++++++++++++++--
>  gcc/tree-pass.h                                    |  5 ++
>  gcc/tree-ssa-loop-ivcanon.c                        |  6 +-
>  31 files changed, 303 insertions(+), 116 deletions(-)
>  create mode 100644 gcc/testsuite/gfortran.dg/parloops-outer-1.f95
> 
> diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
> index b32781a5..bdafd40 100644
> --- a/gcc/graphite-isl-ast-to-gimple.c
> +++ b/gcc/graphite-isl-ast-to-gimple.c
> @@ -442,7 +442,8 @@ translate_isl_ast_for_loop (loop_p context_loop,
>    redirect_edge_succ_nodup (next_e, after);
>    set_immediate_dominator (CDI_DOMINATORS, next_e->dest, next_e->src);
>  
> -  if (flag_loop_parallelize_all)
> +  if (flag_loop_parallelize_all
> +      || flag_tree_parallelize_loops > 1)
>    {
>      isl_id *id = isl_ast_node_get_annotation (node_for);
>      gcc_assert (id);
> @@ -995,7 +996,8 @@ scop_to_isl_ast (scop_p scop, ivs_params &ip)
>    context_isl = set_options (context_isl, schedule_isl, options_luj);
>  
>    isl_union_map *dependences = NULL;
> -  if (flag_loop_parallelize_all)
> +  if (flag_loop_parallelize_all
> +      || flag_tree_parallelize_loops > 1)
>    {
>      dependences = scop_get_dependences (scop);
>      context_isl =
> diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
> index bcd08d8..e32325e 100644
> --- a/gcc/graphite-poly.c
> +++ b/gcc/graphite-poly.c
> @@ -241,7 +241,8 @@ apply_poly_transforms (scop_p scop)
>    if (flag_graphite_identity)
>      transform_done = true;
>  
> -  if (flag_loop_parallelize_all)
> +  if (flag_loop_parallelize_all
> +      || flag_tree_parallelize_loops > 1)
>      transform_done = true;
>  
>    if (flag_loop_block)
> diff --git a/gcc/graphite.c b/gcc/graphite.c
> index a81ef6a..6ba58c0 100644
> --- a/gcc/graphite.c
> +++ b/gcc/graphite.c
> @@ -319,7 +319,7 @@ graphite_transforms (struct function *fun)
>    return 0;
>  }
>  
> -static bool
> +bool
>  gate_graphite_transforms (void)
>  {
>    /* Enable -fgraphite pass if any one of the graphite optimization flags
> @@ -373,6 +373,45 @@ make_pass_graphite (gcc::context *ctxt)
>  
>  namespace {
>  
> +const pass_data pass_data_graphite_parloops =
> +{
> +  GIMPLE_PASS, /* type */
> +  "graphite_parloops", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_GRAPHITE, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_graphite_parloops : public gimple_opt_pass
> +{
> +public:
> +  pass_graphite_parloops (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_graphite_parloops, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +  {
> +    return (flag_tree_parallelize_loops > 1
> +	    && !gate_graphite_transforms ());
> +  }
> +
> +}; // class pass_graphite_parloops
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_graphite_parloops (gcc::context *ctxt)
> +{
> +  return new pass_graphite_parloops (ctxt);
> +}
> +
> +namespace {
> +
>  const pass_data pass_data_graphite_transforms =
>  {
>    GIMPLE_PASS, /* type */
> @@ -407,4 +446,46 @@ make_pass_graphite_transforms (gcc::context *ctxt)
>    return new pass_graphite_transforms (ctxt);
>  }
>  
> +/* It would be preferable to use a clone of pass_data_graphite_transforms rather
> +   than declare a new pass.  But when using a clone of
> +   pass_data_graphite_transforms (and changing the gate to trigger for
> +   flag_tree_parallelize_loops > 1 as well) in pass group
> +   pass_graphite_parloops, the pass is not executed.  */
> +
> +namespace {
> +
> +const pass_data pass_data_graphite_transforms2 =
> +{
> +  GIMPLE_PASS, /* type */
> +  "graphite2", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_GRAPHITE_TRANSFORMS, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_graphite_transforms2 : public gimple_opt_pass
> +{
> +public:
> +  pass_graphite_transforms2 (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_graphite_transforms2, ctxt)
> +  {}
>  
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +  {
> +    return (flag_tree_parallelize_loops > 1);
> +  }
> +  virtual unsigned int execute (function *fun) { return graphite_transforms (fun); }
> +}; // class pass_graphite_transforms2
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_graphite_transforms2 (gcc::context *ctxt)
> +{
> +  return new pass_graphite_transforms2 (ctxt);
> +}
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index 3135606..8cbee3a 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -9576,6 +9576,7 @@ public:
>        return !(fun->curr_properties & PROP_gimple_eomp);
>      }
>    virtual unsigned int execute (function *) { return execute_expand_omp (); }
> +  opt_pass *clone () { return new pass_expand_omp_ssa (m_ctxt); }
>  
>  }; // class pass_expand_omp_ssa
>  
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 5cd07ae..aa1d1a1 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -244,6 +244,17 @@ along with GCC; see the file COPYING3.  If not see
>  	      NEXT_PASS (pass_dce);
>  	  POP_INSERT_PASSES ()
>  	  NEXT_PASS (pass_iv_canon);
> +	  NEXT_PASS (pass_parallelize_reductions);
> +	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_reductions)
> +	      NEXT_PASS (pass_expand_omp_ssa);
> +	  POP_INSERT_PASSES ()
> +	  NEXT_PASS (pass_graphite_parloops);
> +	  PUSH_INSERT_PASSES_WITHIN (pass_graphite_parloops)
> +	      NEXT_PASS (pass_graphite_transforms2);
> +	      NEXT_PASS (pass_lim);
> +	      NEXT_PASS (pass_copy_prop);
> +	      NEXT_PASS (pass_dce);
> +	  POP_INSERT_PASSES ()
>  	  NEXT_PASS (pass_parallelize_loops);
>  	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
>  	      NEXT_PASS (pass_expand_omp_ssa);
> diff --git a/gcc/testsuite/gcc.dg/autopar/outer-6.c b/gcc/testsuite/gcc.dg/autopar/outer-6.c
> index 6bef7cc..0f01bd5 100644
> --- a/gcc/testsuite/gcc.dg/autopar/outer-6.c
> +++ b/gcc/testsuite/gcc.dg/autopar/outer-6.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-optimized" } */
>  
>  void abort (void);
>  
> @@ -44,6 +44,6 @@ int main(void)
>  
>  
>  /* Check that outer loop is parallelized.  */
> -/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloopsred" } } */
>  /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-1.c b/gcc/testsuite/gcc.dg/autopar/reduc-1.c
> index 6e9a280..4fc9b31 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-1.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdarg.h>
>  #include <stdlib.h>
> @@ -66,6 +66,7 @@ int main (void)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-1char.c b/gcc/testsuite/gcc.dg/autopar/reduc-1char.c
> index 48ead88..497b7e0 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-1char.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-1char.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdarg.h>
>  #include <stdlib.h>
> @@ -60,6 +60,7 @@ int main (void)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-1short.c b/gcc/testsuite/gcc.dg/autopar/reduc-1short.c
> index f3f547c..6af8e4b 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-1short.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-1short.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdarg.h>
>  #include <stdlib.h>
> @@ -59,6 +59,7 @@ int main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2.c b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
> index 3ad16e4..2d0b2a1 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-2.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdarg.h>
>  #include <stdlib.h>
> @@ -63,6 +63,7 @@ int main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
> index 072489f..49ef16d 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdarg.h>
>  #include <stdlib.h>
> @@ -60,7 +60,8 @@ int main (void)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
>  
>  
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
> index 4dbbc8a..3ec1c2a 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdarg.h>
>  #include <stdlib.h>
> @@ -59,6 +59,7 @@ int main (void)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-3.c b/gcc/testsuite/gcc.dg/autopar/reduc-3.c
> index 0d4baef..e7ca82b 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-3.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdarg.h>
>  #include <stdlib.h>
> @@ -50,6 +50,7 @@ int main (void)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 1 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 1 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloopsred" } } */
>  /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops" } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-6.c b/gcc/testsuite/gcc.dg/autopar/reduc-6.c
> index 91f679e..6c5ec7b 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-6.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-6.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdarg.h>
>  #include <stdlib.h>
> @@ -56,6 +56,6 @@ int main (void)
>  
>  
>  /* need -ffast-math to  parallelize these loops.  */
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 0 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 0 "parloopsred" } } */
>  /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "FAILED: it is not a part of reduction" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "FAILED: it is not a part of reduction" 3 "parloopsred" } } */
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-7.c b/gcc/testsuite/gcc.dg/autopar/reduc-7.c
> index 77b99e1..dccf2a5 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-7.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-7.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdlib.h>
>  
> @@ -84,6 +84,7 @@ int main (void)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
> index 16fb954..466bcc5 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdlib.h>
>  
> @@ -84,5 +84,6 @@ int main (void)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
> diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-9.c b/gcc/testsuite/gcc.dg/autopar/reduc-9.c
> index 90f4db2..11556d7 100644
> --- a/gcc/testsuite/gcc.dg/autopar/reduc-9.c
> +++ b/gcc/testsuite/gcc.dg/autopar/reduc-9.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloopsred-details -fdump-tree-parloops-details -fdump-tree-optimized" } */
>  
>  #include <stdlib.h>
>  
> @@ -84,5 +84,6 @@ int main (void)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } } */
> diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
> index 24e605a..f1cf75f 100644
> --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
> +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target pthread } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
>  
>  /* Constant bound, vector addition.  */
>  
> @@ -19,9 +19,4 @@ f (void)
>        c[i] = a[i] + b[i];
>  }
>  
> -/* Three times three array accesses:
> -   - three in f._loopfn.0
> -   - three in the parallel
> -   - three in the low iteration count loop
> -   Crucially, none for a peeled off last iteration following the parallel.  */
> -/* { dg-final { scan-tree-dump-times "(?n)\\\[i" 9 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
> diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
> index fec53a1..6c34084 100644
> --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
> +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target pthread } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloopsred-details" } */
>  
>  /* Variable bound, reduction.  */
>  
> @@ -18,9 +18,4 @@ f (unsigned int n, unsigned int *__restrict__ a)
>    return sum;
>  }
>  
> -/* Three array accesses:
> -   - one in f._loopfn.0
> -   - one in the parallel
> -   - one in the low iteration count loop
> -   Crucially, none for a peeled off last iteration following the parallel.  */
> -/* { dg-final { scan-tree-dump-times "(?n)\\\* 4" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloopsred" } } */
> diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-4.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-4.c
> index 2b8d289..f051ed4 100644
> --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-4.c
> +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-4.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target pthread } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloopsred-details" } */
>  
>  /* Constant bound, reduction.  */
>  
> @@ -20,9 +20,4 @@ f (void)
>    return sum;
>  }
>  
> -/* Three array accesses:
> -   - one in f._loopfn.0
> -   - one in the parallel
> -   - one in the low iteration count loop
> -   Crucially, none for a peeled off last iteration following the parallel.  */
> -/* { dg-final { scan-tree-dump-times "(?n)\\\* 4" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloopsred" } } */
> diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-5.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-5.c
> index 3f799cf..3c1e99b 100644
> --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-5.c
> +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-5.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target pthread } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
>  
>  /* Variable bound, vector addition, unsigned loop counter, unsigned bound.  */
>  
> @@ -14,9 +14,4 @@ f (unsigned int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
>      c[i] = a[i] + b[i];
>  }
>  
> -/* Three times a store:
> -   - one in f._loopfn.0
> -   - one in the parallel
> -   - one in the low iteration count loop
> -   Crucially, none for a peeled off last iteration following the parallel.  */
> -/* { dg-final { scan-tree-dump-times "(?n)^  \\*_\[0-9\]*" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
> diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-6.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-6.c
> index ee19a55..edc60ba 100644
> --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-6.c
> +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-6.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target pthread } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
>  
>  /* Variable bound, vector addition, unsigned loop counter, signed bound.  */
>  
> @@ -14,9 +14,4 @@ f (int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
>      c[i] = a[i] + b[i];
>  }
>  
> -/* Three times a store:
> -   - one in f._loopfn.0
> -   - one in the parallel
> -   - one in the low iteration count loop
> -   Crucially, none for a peeled off last iteration following the parallel.  */
> -/* { dg-final { scan-tree-dump-times "(?n)^  \\*_\[0-9\]*" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
> diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-7.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-7.c
> index c337342..38be2e8 100644
> --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-7.c
> +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-7.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target pthread } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
>  
>  /* Variable bound, vector addition, signed loop counter, signed bound.  */
>  
> @@ -14,9 +14,4 @@ f (int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
>      c[i] = a[i] + b[i];
>  }
>  
> -/* Three times a store:
> -   - one in f._loopfn.0
> -   - one in the parallel
> -   - one in the low iteration count loop
> -   Crucially, none for a peeled off last iteration following the parallel.  */
> -/* { dg-final { scan-tree-dump-times "(?n)^  \\*_\[0-9\]*" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
> diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c
> index 2ea097d..7b64368 100644
> --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c
> +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target pthread } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloopsred-details" } */
>  
>  #include <stdio.h>
>  #include <stdlib.h>
> @@ -22,10 +22,5 @@ f (unsigned int n, unsigned int sum)
>    return sum;
>  }
>  
> -/* Four times % 13:
> -   - once in f._loopfn.0
> -   - once in the parallel
> -   - once in the low iteration count loop
> -   - once for a peeled off last iteration following the parallel.
> -   In other words, we want try_transform_to_exit_first_loop_alt to fail.  */
> -/* { dg-final { scan-tree-dump-times "(?n)% 13" 4 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 1 "parloopsred" } } */
> +/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 0 "parloopsred" } } */
> diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
> index 0b69165..44596e3 100644
> --- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
> +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target pthread } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" } */
>  
>  /* Variable bound, vector addition, signed loop counter, unsigned bound.  */
>  
> @@ -14,9 +14,5 @@ f (unsigned int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
>      c[i] = a[i] + b[i];
>  }
>  
> -/* Three times a store:
> -   - one in f._loopfn.0
> -   - one in the parallel
> -   - one in the low iteration count loop
> -   Crucially, none for a peeled off last iteration following the parallel.  */
> -/* { dg-final { scan-tree-dump-times "(?n)^  \\*_\[0-9\]*" 3 "parloops" } } */
> +/* { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } } */
> +
> diff --git a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95 b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
> index f26a6e3..52434f2 100644
> --- a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
> +++ b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
> @@ -1,7 +1,7 @@
>  ! { dg-additional-options "-O2" }
>  ! { dg-require-effective-target pthread }
>  ! { dg-additional-options "-ftree-parallelize-loops=2" }
> -! { dg-additional-options "-fdump-tree-parloops" }
> +! { dg-additional-options "-fdump-tree-parloops-details" }
>  
>  ! Constant bound, vector addition.
>  
> @@ -16,9 +16,4 @@ subroutine foo ()
>    end do
>  end subroutine foo
>  
> -! Three times plus 25:
> -! - once in f._loopfn.0
> -! - once in the parallel
> -! - once in the low iteration count loop
> -! Crucially, none for a peeled off last iteration following the parallel.
> -! { dg-final { scan-tree-dump-times "(?n) \\+ 25;" 3 "parloops" } }
> +! { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } }
> diff --git a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95 b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
> index 6dc8a38..1eb9dfd 100644
> --- a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
> +++ b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
> @@ -1,7 +1,7 @@
>  ! { dg-additional-options "-O2" }
>  ! { dg-require-effective-target pthread }
>  ! { dg-additional-options "-ftree-parallelize-loops=2" }
> -! { dg-additional-options "-fdump-tree-parloops" }
> +! { dg-additional-options "-fdump-tree-parloops-details" }
>  
>  ! Variable bound, vector addition.
>  
> @@ -17,9 +17,5 @@ subroutine foo (nr)
>    end do
>  end subroutine foo
>  
> -! Three times plus 25:
> -! - once in f._loopfn.0
> -! - once in the parallel
> -! - once in the low iteration count loop
> -! Crucially, none for a peeled off last iteration following the parallel.
> -! { dg-final { scan-tree-dump-times "(?n) \\+ 25;" 3 "parloops" } }
> +! { dg-final { scan-tree-dump-times "alternative exit-first loop transform succeeded" 1 "parloops" } }
> +
> diff --git a/gcc/testsuite/gfortran.dg/parloops-outer-1.f95 b/gcc/testsuite/gfortran.dg/parloops-outer-1.f95
> new file mode 100644
> index 0000000..144e4e8
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/parloops-outer-1.f95
> @@ -0,0 +1,37 @@
> +! { dg-do compile }
> +! { dg-additional-options "-O2" }
> +! { dg-additional-options "-ftree-parallelize-loops=2" }
> +! { dg-additional-options "-fdump-tree-parloops-all" }
> +! { dg-additional-options "-fdump-tree-optimized" }
> +
> +! Based on autopar/outer-1.c.
> +
> +program main
> +  implicit none
> +  integer, parameter         :: n = 500
> +  integer, dimension (0:n-1, 0:n-1) :: x
> +  integer                    :: i, j, ii, jj
> +
> +
> +  do ii = 0, n - 1
> +     do jj = 0, n - 1
> +        x(jj, ii) = ii + jj + 3
> +     end do
> +  end do
> +
> +  do i = 0, n - 1
> +     do j = 0, n - 1
> +        if (x(j, i) .ne. i + j + 3) call abort
> +     end do
> +  end do
> +
> +end program main
> +
> +! Check that only one loop is analyzed, and that it can be parallelized.
> +! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops" } }
> +! { dg-final { scan-tree-dump-not "FAILED:" "parloops" } }
> +! { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } }
> +
> +! Check that the loop has been split off into a function.
> +! { dg-final { scan-tree-dump-times "(?n);; Function main._loopfn.0 " 1 "optimized" } }
> +
> diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
> index 036677b..4bfe588 100644
> --- a/gcc/tree-parloops.c
> +++ b/gcc/tree-parloops.c
> @@ -2238,7 +2238,15 @@ gen_parallel_loop (struct loop *loop,
>       increment) and immediately follows the loop exit test.  Attempt to move the
>       entry of the loop directly before the exit check and increase the number of
>       iterations of the loop by one.  */
> -  if (!try_transform_to_exit_first_loop_alt (loop, reduction_list, nit))
> +  if (try_transform_to_exit_first_loop_alt (loop, reduction_list, nit))
> +    {
> +      if (dump_file
> +	  && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file,
> +		 "alternative exit-first loop transform succeeded"
> +		 " for loop %d\n", loop->num);
> +    }
> +  else
>      {
>        /* Fall back on the method that handles more cases, but duplicates the
>  	 loop body: move the exit condition of LOOP to the beginning of its
> @@ -2508,7 +2516,7 @@ try_create_reduction_list (loop_p loop,
>     otherwise.  */
>  
>  static bool
> -parallelize_loops (void)
> +parallelize_loops (bool reductions_only)
>  {
>    unsigned n_threads = flag_tree_parallelize_loops;
>    bool changed = false;
> @@ -2584,10 +2592,31 @@ parallelize_loops (void)
>        if (!try_create_reduction_list (loop, &reduction_list))
>  	continue;
>  
> -      if (!flag_loop_parallelize_all
> -	  && !loop_parallel_p (loop, &parloop_obstack))
> +      if (reductions_only
> +	  && reduction_list.elements () == 0)
>  	continue;
>  
> +      if (!flag_loop_parallelize_all)
> +	{
> +	  bool independent = false;
> +
> +	  if (!independent
> +	      && loop->can_be_parallel)
> +	    {
> +	      if (dump_file
> +		  && (dump_flags & TDF_DETAILS))
> +		fprintf (dump_file,
> +			 "  SUCCESS: may be parallelized, graphite analysis\n");
> +	      independent = true;
> +	    }
> +
> +	  if (!independent)
> +	    independent = loop_parallel_p (loop, &parloop_obstack);
> +
> +	  if (!independent)
> +	    continue;
> +	}
> +
>        changed = true;
>        if (dump_file && (dump_flags & TDF_DETAILS))
>        {
> @@ -2652,7 +2681,7 @@ pass_parallelize_loops::execute (function *fun)
>    if (number_of_loops (fun) <= 1)
>      return 0;
>  
> -  if (parallelize_loops ())
> +  if (parallelize_loops (false))
>      {
>        fun->curr_properties &= ~(PROP_gimple_eomp);
>        return TODO_update_ssa;
> @@ -2668,3 +2697,57 @@ make_pass_parallelize_loops (gcc::context *ctxt)
>  {
>    return new pass_parallelize_loops (ctxt);
>  }
> +
> +namespace {
> +
> +const pass_data pass_data_parallelize_reductions =
> +{
> +  GIMPLE_PASS, /* type */
> +  "parloopsred", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_TREE_PARALLELIZE_LOOPS, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_parallelize_reductions : public gimple_opt_pass
> +{
> +public:
> +  pass_parallelize_reductions (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_parallelize_reductions, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +  {
> +    return (flag_tree_parallelize_loops > 1
> +	    && !gate_graphite_transforms ());
> +  }
> +  virtual unsigned int execute (function *);
> +}; // class pass_parallelize_reductions
> +
> +unsigned
> +pass_parallelize_reductions::execute (function *fun)
> +{
> +  if (number_of_loops (fun) <= 1)
> +    return 0;
> +
> +  if (parallelize_loops (true))
> +    {
> +      fun->curr_properties &= ~(PROP_gimple_eomp);
> +      return TODO_update_ssa;
> +    }
> +
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_parallelize_reductions (gcc::context *ctxt)
> +{
> +  return new pass_parallelize_reductions (ctxt);
> +}
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index c47b22e..f0a7017 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -368,7 +368,9 @@ extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_empty_loop (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_record_bounds (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_graphite (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_graphite_parloops (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_graphite_transforms (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_graphite_transforms2 (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_if_conversion (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_distribution (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_vectorize (gcc::context *ctxt);
> @@ -377,6 +379,7 @@ extern gimple_opt_pass *make_pass_slp_vectorize (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_complete_unroll (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_complete_unrolli (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_parallelize_loops (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_parallelize_reductions (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
> @@ -595,6 +598,8 @@ extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
>  
> +extern bool gate_graphite_transforms (void);
> +
>  /* Current optimization pass.  */
>  extern opt_pass *current_pass;
>  
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index eca70a9..43724ed 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1421,7 +1421,11 @@ public:
>    {}
>  
>    /* opt_pass methods: */
> -  virtual bool gate (function *) { return flag_tree_loop_ivcanon != 0; }
> +  virtual bool gate (function *)
> +  {
> +    return (flag_tree_loop_ivcanon != 0
> +	    && flag_tree_parallelize_loops <= 1);
> +  }
>    virtual unsigned int execute (function *fun);
>  
>  }; // class pass_iv_canon
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-20 18:53         ` Sebastian Pop
@ 2015-07-21  0:22           ` Tom de Vries
  0 siblings, 0 replies; 27+ messages in thread
From: Tom de Vries @ 2015-07-21  0:22 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Biener, Thomas Schwinge, gcc-patches

On 20/07/15 20:22, Sebastian Pop wrote:
> Tom de Vries wrote:
>>>>> graphite dependence analysis is too slow to be enabled unconditionally.
>>>>> (read: hours in some simple cases - see bugzilla)
>>>>
>>>> Haha, "cool"!  ;-)
>>>>
>>>> Maybe it is still reasonable to use graphite to analyze the code inside
>>>> OpenACC kernels regions -- maybe such code can reasonably be expected to
>>>> not have the properties that make its analysis lengthy?  So, Tom, could
>>>> you please identify and check such PRs, to get an understanding of what
>>>> these properties are?
>>>
>>> Like the one in PR62113 or 53852 or 59121.
>>
>> PR62113 and PR59121 do not reproduce for me on trunk.
>>
>> PR53852 does reproduce for me (to the point that I had to reset my laptop).
>
> ISL has a way to count the number of operations, based on a watermark it will
> output an error code that we can use to leave graphite: see documentation of
> isl_ctx_set_max_operations().  With that mechanism we can set a goal for
> graphite of at max (say 10% overhead) of whole compilation time.
>

Agree, bounding graphite to a limited runtime sound like a good idea.

Determining the bound (in terms of isl operations) doesn't look trivial 
though. I suppose a basic version could be number of gimple operations 
in function times a constant.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-20 18:54 ` Sebastian Pop
@ 2015-07-21  5:59   ` Tom de Vries
  2015-07-21 14:35     ` Tom de Vries
  0 siblings, 1 reply; 27+ messages in thread
From: Tom de Vries @ 2015-07-21  5:59 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On 20/07/15 20:31, Sebastian Pop wrote:
> Tom de Vries wrote:
>> So I wondered, why not always use the graphite dependency analysis
>> in parloops. (Of course you could use -floop-parallelize-all, but
>> that also changes the heuristic). So I wrote a patch for parloops to
>> use graphite dependency analysis by default (so without
>> -floop-parallelize-all), but while testing found out that all the
>> reduction test-cases started failing because the modifications
>> graphite makes to the code messes up the parloops reduction
>> analysis.
>>
>> Then I came up with this patch, which:
>> - first runs a parloops pass, restricted to reduction loops only,
>
> I would prefer to fix graphite to catch the reduction loop and avoid running an
> extra pass before graphite for that case.

> Can you please specify which file is
> failing to be parallelized?  Are they all those testcases that you update the flags?

Yep, f.i. autopar/reduc-1.c.

> Also it seems to me that you are missing -ffast-math to parallelize all these
> loops: without that flag graphite would not mark reductions as
> associative/commutative operations and they would not be recognized as parallel.

For an unsigned int reduction, we need don't need -ffast-math, so we 
don't have to specify it for parloops. It seems graphite is too strict 
in that, since it won't do any reductions without -fassociate-math.

But indeed, with -ffast-math -ftree-parallelize-loops=2 
-floop-parallelize-all we are able to parallelize the 3 reduction loops 
in autopar/reduc-1.c

> Is that something the current parloops detection is not too strict about?

Parloops uses vect_is_simple_reduction_1, which has some extensive 
testing to see if reordering of operations is allowed. The testing of 
graphite seems to be limited to testing fassociative-math, which makes 
me suspect that tests are missing there, f.i. TYPE_OVERFLOW_TRAPS.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-21  5:59   ` Tom de Vries
@ 2015-07-21 14:35     ` Tom de Vries
  2015-07-21 19:08       ` Sebastian Pop
  0 siblings, 1 reply; 27+ messages in thread
From: Tom de Vries @ 2015-07-21 14:35 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

On 21/07/15 02:21, Tom de Vries wrote:
> On 20/07/15 20:31, Sebastian Pop wrote:
>> Tom de Vries wrote:
>>> So I wondered, why not always use the graphite dependency analysis
>>> in parloops. (Of course you could use -floop-parallelize-all, but
>>> that also changes the heuristic). So I wrote a patch for parloops to
>>> use graphite dependency analysis by default (so without
>>> -floop-parallelize-all), but while testing found out that all the
>>> reduction test-cases started failing because the modifications
>>> graphite makes to the code messes up the parloops reduction
>>> analysis.
>>>
>>> Then I came up with this patch, which:
>>> - first runs a parloops pass, restricted to reduction loops only,
>>
>> I would prefer to fix graphite to catch the reduction loop and avoid
>> running an
>> extra pass before graphite for that case.
>
>> Can you please specify which file is
>> failing to be parallelized?  Are they all those testcases that you
>> update the flags?
>
> Yep, f.i. autopar/reduc-1.c.
>
>> Also it seems to me that you are missing -ffast-math to parallelize
>> all these
>> loops: without that flag graphite would not mark reductions as
>> associative/commutative operations and they would not be recognized as
>> parallel.
>
> For an unsigned int reduction, we need don't need -ffast-math, so we
> don't have to specify it for parloops. It seems graphite is too strict
> in that, since it won't do any reductions without -fassociate-math.
>
> But indeed, with -ffast-math -ftree-parallelize-loops=2
> -floop-parallelize-all we are able to parallelize the 3 reduction loops
> in autopar/reduc-1.c
>
>> Is that something the current parloops detection is not too strict about?
>
> Parloops uses vect_is_simple_reduction_1, which has some extensive
> testing to see if reordering of operations is allowed.

Hmm, vect_is_simple_reduction_1 seems to miss the check for 
TYPE_OVERFLOW_WRAPS.

> The testing of
> graphite seems to be limited to testing fassociative-math, which makes
> me suspect that tests are missing there, f.i. TYPE_OVERFLOW_TRAPS.

I could not enable both fassociative-math and -ftrapv, so I guess that 
case was covered implicitly.

Attached patch:
- adds the missing TYPE_OVERFLOW_WRAPS check in
   vect_is_simple_reduction_1
- enabled reductions in graphite, when safe
- introduces a macro FIXED_POINT_TYPE_OVERFLOW_WRAPS_P

Currently bootstrapping and reg-testing on x86_64.

Thanks,
- Tom


[-- Attachment #2: 0001-Fix-reduction-safety-checks.patch --]
[-- Type: text/x-patch, Size: 10665 bytes --]

Fix reduction safety checks

2015-07-21  Tom de Vries  <tom@codesourcery.com>

	* tree.h (FIXED_POINT_TYPE_OVERFLOW_WRAPS_P): Define.
	* tree-ssa-reassoc.c (can_reassociate_p): Rewrite using
	FIXED_POINT_TYPE_OVERFLOW_WRAPS_P.
	* graphite-sese-to-poly.c (is_reduction_operation_p): Limit
	flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
	TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
	Only allow wrapping fixed-point otherwise.
	(build_poly_scop): Always call
	rewrite_commutative_reductions_out_of_ssa.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Honour
	TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P. Rewrite using
	FIXED_POINT_TYPE_OVERFLOW_WRAPS_P.

	* gcc.dg/autopar/outer-4.c: Change reduction type to unsigned.
	* gcc.dg/autopar/outer-5.c: Same.
	* gcc.dg/autopar/outer-6.c: Same.
	* gcc.dg/autopar/reduc-2.c: Add -fwrapv to dg-options.
	* gcc.dg/autopar/reduc-8.c: Same.
	* gcc.dg/autopar/reduc-2char.c: Add -fwrapv to dg-options. Update
	scan-tree-dumps.
	* gcc.dg/autopar/reduc-2short.c: Same.
---
 gcc/graphite-sese-to-poly.c                 | 21 +++++++++++++++-----
 gcc/testsuite/gcc.dg/autopar/outer-4.c      |  6 +++---
 gcc/testsuite/gcc.dg/autopar/outer-5.c      |  8 ++++----
 gcc/testsuite/gcc.dg/autopar/outer-6.c      |  8 ++++----
 gcc/testsuite/gcc.dg/autopar/reduc-2.c      |  2 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c  |  6 +++---
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c |  6 +++---
 gcc/testsuite/gcc.dg/autopar/reduc-8.c      |  2 +-
 gcc/tree-ssa-reassoc.c                      |  3 ++-
 gcc/tree-vect-loop.c                        | 30 +++++++++++++++++++++--------
 gcc/tree.h                                  | 12 ++++++++++++
 11 files changed, 71 insertions(+), 33 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 8960c3f..cb2204e 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2604,9 +2604,21 @@ is_reduction_operation_p (gimple stmt)
   gcc_assert (is_gimple_assign (stmt));
   code = gimple_assign_rhs_code (stmt);
 
-  return flag_associative_math
-    && commutative_tree_code (code)
-    && associative_tree_code (code);
+  if (!commutative_tree_code (code)
+      || !associative_tree_code (code))
+    return false;
+
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  if (SCALAR_FLOAT_TYPE_P (type))
+    return flag_associative_math;
+
+  if (INTEGRAL_TYPE_P (type))
+    return (!TYPE_OVERFLOW_TRAPS (type)
+	    && TYPE_OVERFLOW_WRAPS (type));
+
+  return (FIXED_POINT_TYPE_P (type)
+	  && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));
 }
 
 /* Returns true when PHI contains an argument ARG.  */
@@ -3147,8 +3159,7 @@ build_poly_scop (scop_p scop)
   if (!scop_ivs_can_be_represented (scop))
     return;
 
-  if (flag_associative_math)
-    rewrite_commutative_reductions_out_of_ssa (scop);
+  rewrite_commutative_reductions_out_of_ssa (scop);
 
   build_sese_loop_nests (region);
   /* Record all conditions in REGION.  */
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c
index 6fd37c5..a57a0e4 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c
@@ -3,14 +3,14 @@
 
 void abort (void);
 
-int g_sum=0;
-int x[500][500];
+unsigned int g_sum=0;
+unsigned int x[500][500];
 
 __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Double reduction is currently not supported, outer loop is not 
      parallelized.  Inner reduction is detected, inner loop is 
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-5.c b/gcc/testsuite/gcc.dg/autopar/outer-5.c
index 6a0ae91..c1bda6a 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-5.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-5.c
@@ -3,9 +3,9 @@
 
 void abort (void);
 
-int x[500][500];
-int y[500];
-int g_sum=0;
+unsigned int x[500][500];
+unsigned int y[500];
+unsigned int g_sum=0;
 
 __attribute__((noinline))
 void init (int i, int j)
@@ -17,7 +17,7 @@ __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Inner cycle is currently not supported, outer loop is not 
      parallelized.  Inner reduction is detected, inner loop is 
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-6.c b/gcc/testsuite/gcc.dg/autopar/outer-6.c
index 6bef7cc..9f90ba6 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-6.c
@@ -3,9 +3,9 @@
 
 void abort (void);
 
-int x[500][500];
-int y[500];
-int g_sum=0;
+unsigned int x[500][500];
+unsigned int y[500];
+unsigned int g_sum=0;
 
 __attribute__((noinline))
 void init (int i, int j)
@@ -17,7 +17,7 @@ __attribute__((noinline))
 void parloop (int N)
 {
   int i, j;
-  int sum;
+  unsigned int sum;
 
   /* Outer loop reduction, outerloop is parallelized.  */ 
   sum=0;
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2.c b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
index 3ad16e4..ad78241 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 072489f..857a3cf 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -60,7 +60,7 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 4dbbc8a..dcf537e 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdarg.h>
 #include <stdlib.h>
@@ -59,6 +59,6 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 16fb954..05f1126 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized -fwrapv" } */
 
 #include <stdlib.h>
 
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index efb813c..b48ae1e 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4229,7 +4229,8 @@ can_reassociate_p (tree op)
 {
   tree type = TREE_TYPE (op);
   if ((INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type))
-      || NON_SAT_FIXED_POINT_TYPE_P (type)
+      || (FIXED_POINT_TYPE_P (type)
+	  && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type))
       || (flag_associative_math && FLOAT_TYPE_P (type)))
     return true;
   return false;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 9145dbf..e014be2 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			"reduction: unsafe fp math optimization: ");
       return NULL;
     }
-  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
-	   && check_reduction)
+  else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe int math optimization: ");
-      return NULL;
+      if (TYPE_OVERFLOW_TRAPS (type))
+	{
+	  /* Changing the order of operations changes the semantics.  */
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: unsafe int math optimization"
+			    " (overflow traps): ");
+	  return NULL;
+	}
+      if (!TYPE_OVERFLOW_WRAPS (type))
+	{
+	  /* Changing the order of operations changes the semantics.  */
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: unsafe int math optimization"
+			    " (overflow doesn't wrap): ");
+	  return NULL;
+	}
     }
-  else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
+  else if (FIXED_POINT_TYPE_P (type)
+	   && !FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type)
+	   && check_reduction)
     {
       /* Changing the order of operations changes the semantics.  */
       if (dump_enabled_p ())
diff --git a/gcc/tree.h b/gcc/tree.h
index 6df2217..f2ae669 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -501,6 +501,18 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 #define FIXED_POINT_TYPE_P(TYPE)	(TREE_CODE (TYPE) == FIXED_POINT_TYPE)
 
+/* Nonzero if fixed-point type TYPE wraps at overflow.
+
+   GCC support of fixed-point types as specified by the draft technical report
+   (N1169 draft of ISO/IEC DTR 18037) is incomplete: Pragmas to control overflow
+   and rounding behaviors are not implemented.
+
+   So, if not saturating, we assume modular wrap-around (see Annex E.4 Modwrap
+   overflow).  */
+
+#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
+  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))
+
 /* Nonzero if TYPE represents a scalar floating-point type.  */
 
 #define SCALAR_FLOAT_TYPE_P(TYPE) (TREE_CODE (TYPE) == REAL_TYPE)
-- 
1.9.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-21 14:35     ` Tom de Vries
@ 2015-07-21 19:08       ` Sebastian Pop
  2015-07-22 11:02         ` Richard Biener
  0 siblings, 1 reply; 27+ messages in thread
From: Sebastian Pop @ 2015-07-21 19:08 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

Tom de Vries wrote:
> Fix reduction safety checks
> 
> 	* graphite-sese-to-poly.c (is_reduction_operation_p): Limit
> 	flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
> 	TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
> 	Only allow wrapping fixed-point otherwise.
> 	(build_poly_scop): Always call
> 	rewrite_commutative_reductions_out_of_ssa.

The changes to graphite look good to me.

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-21 19:08       ` Sebastian Pop
@ 2015-07-22 11:02         ` Richard Biener
  2015-07-22 11:18           ` Richard Biener
  2015-07-22 15:33           ` [PATCH] Document ftrapv/fwrapv interaction Tom de Vries
  0 siblings, 2 replies; 27+ messages in thread
From: Richard Biener @ 2015-07-22 11:02 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Tom de Vries, gcc-patches

On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Tom de Vries wrote:
>> Fix reduction safety checks
>>
>>       * graphite-sese-to-poly.c (is_reduction_operation_p): Limit
>>       flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
>>       TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
>>       Only allow wrapping fixed-point otherwise.
>>       (build_poly_scop): Always call
>>       rewrite_commutative_reductions_out_of_ssa.
>
> The changes to graphite look good to me.

+  if (SCALAR_FLOAT_TYPE_P (type))
+    return flag_associative_math;
+

why only scalar floats?  Please use FLOAT_TYPE_P.

+  if (INTEGRAL_TYPE_P (type))
+    return (!TYPE_OVERFLOW_TRAPS (type)
+           && TYPE_OVERFLOW_WRAPS (type));

it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.

I'm sure you'll disable quite some parallelization this way... (the
routine is modeled after
the vectorizers IIRC, so it would be affected as well).  Yeah - I see
you modify autopar
testcases.  Please instead XFAIL the existing ones and add variants
with unsigned
reductions.  Adding -fwrapv isn't a good solution either.

Can you think of a testcase that breaks btw?

The "proper" solution (see other passes) is to rewrite the reduction
to a wrapping
one (cast to unsigned for the reduction op).

+  return (FIXED_POINT_TYPE_P (type)
+         && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));

why?  Simply return false here instead?

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 9145dbf..e014be2 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info
loop_info, gimple phi,
                        "reduction: unsafe fp math optimization: ");
       return NULL;
     }
-  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
-          && check_reduction)
+  else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
...

You didn't need to adjust any testcases?  That's probably because the
checking above is
not always executed (see PR66623 for a related testcase).  The code
needs refactoring.
And we need a way-out, that is, we do _not_ want to not vectorize
signed reductions.
So you need to fix code generation instead.

+/* Nonzero if fixed-point type TYPE wraps at overflow.
+
+   GCC support of fixed-point types as specified by the draft technical report
+   (N1169 draft of ISO/IEC DTR 18037) is incomplete: Pragmas to
control overflow
+   and rounding behaviors are not implemented.
+
+   So, if not saturating, we assume modular wrap-around (see Annex E.4 Modwrap
+   overflow).  */
+
+#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
+  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))

somebody with knowledge about fixed-point types needs to review this.
I suggest to
leave fixed-point changes out from the initial patch submission.

Thanks,
Richard.

> Thanks,
> Sebastian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-22 11:02         ` Richard Biener
@ 2015-07-22 11:18           ` Richard Biener
  2015-07-22 16:04             ` [PATCH] Don't allow unsafe reductions in graphite Tom de Vries
                               ` (2 more replies)
  2015-07-22 15:33           ` [PATCH] Document ftrapv/fwrapv interaction Tom de Vries
  1 sibling, 3 replies; 27+ messages in thread
From: Richard Biener @ 2015-07-22 11:18 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Tom de Vries, gcc-patches

On Wed, Jul 22, 2015 at 1:01 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>> Tom de Vries wrote:
>>> Fix reduction safety checks
>>>
>>>       * graphite-sese-to-poly.c (is_reduction_operation_p): Limit
>>>       flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
>>>       TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
>>>       Only allow wrapping fixed-point otherwise.
>>>       (build_poly_scop): Always call
>>>       rewrite_commutative_reductions_out_of_ssa.
>>
>> The changes to graphite look good to me.
>
> +  if (SCALAR_FLOAT_TYPE_P (type))
> +    return flag_associative_math;
> +
>
> why only scalar floats?  Please use FLOAT_TYPE_P.
>
> +  if (INTEGRAL_TYPE_P (type))
> +    return (!TYPE_OVERFLOW_TRAPS (type)
> +           && TYPE_OVERFLOW_WRAPS (type));
>
> it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.
>
> I'm sure you'll disable quite some parallelization this way... (the
> routine is modeled after
> the vectorizers IIRC, so it would be affected as well).  Yeah - I see
> you modify autopar
> testcases.  Please instead XFAIL the existing ones and add variants
> with unsigned
> reductions.  Adding -fwrapv isn't a good solution either.
>
> Can you think of a testcase that breaks btw?
>
> The "proper" solution (see other passes) is to rewrite the reduction
> to a wrapping
> one (cast to unsigned for the reduction op).
>
> +  return (FIXED_POINT_TYPE_P (type)
> +         && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));
>
> why?  Simply return false here instead?
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 9145dbf..e014be2 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info
> loop_info, gimple phi,
>                         "reduction: unsafe fp math optimization: ");
>        return NULL;
>      }
> -  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
> -          && check_reduction)
> +  else if (INTEGRAL_TYPE_P (type) && check_reduction)
>      {
> ...
>
> You didn't need to adjust any testcases?  That's probably because the
> checking above is
> not always executed (see PR66623 for a related testcase).  The code
> needs refactoring.
> And we need a way-out, that is, we do _not_ want to not vectorize
> signed reductions.
> So you need to fix code generation instead.

Btw, for the vectorizer the current "trick" is that nobody takes advantage about
overflow undefinedness for vector types.

> +/* Nonzero if fixed-point type TYPE wraps at overflow.
> +
> +   GCC support of fixed-point types as specified by the draft technical report
> +   (N1169 draft of ISO/IEC DTR 18037) is incomplete: Pragmas to
> control overflow
> +   and rounding behaviors are not implemented.
> +
> +   So, if not saturating, we assume modular wrap-around (see Annex E.4 Modwrap
> +   overflow).  */
> +
> +#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
> +  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))
>
> somebody with knowledge about fixed-point types needs to review this.
> I suggest to
> leave fixed-point changes out from the initial patch submission.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Sebastian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] Document ftrapv/fwrapv interaction
  2015-07-22 11:02         ` Richard Biener
  2015-07-22 11:18           ` Richard Biener
@ 2015-07-22 15:33           ` Tom de Vries
  2015-07-23 10:39             ` Richard Biener
  1 sibling, 1 reply; 27+ messages in thread
From: Tom de Vries @ 2015-07-22 15:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

[ Re: [RFC, PR66873] Use graphite for parloops ]
On 22/07/15 13:01, Richard Biener wrote:
> why only scalar floats?  Please use FLOAT_TYPE_P.
>
> +  if (INTEGRAL_TYPE_P (type))
> +    return (!TYPE_OVERFLOW_TRAPS (type)
> +           && TYPE_OVERFLOW_WRAPS (type));
>
> it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.

Hmm, indeed, when specifying both, one is quietly ignored. The 
documentation also doesn't mention this.

Attached untested patch mentions this ftrapv/fwrapv interaction in the docs.

OK for trunk, if bootstrap succeeds?

Thanks,
- Tom



[-- Attachment #2: 0001-Document-ftrapv-fwrapv-interaction.patch --]
[-- Type: text/x-patch, Size: 1676 bytes --]

Document ftrapv/fwrapv interaction

2015-07-22  Tom de Vries  <tom@codesourcery.com>

	* doc/invoke.texi (@item -ftrapv, @item -fwrapv): Document interaction.
---
 gcc/doc/invoke.texi | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 55c2659..aa0b0c0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -23676,6 +23676,11 @@ option is used to control the temporary stack reuse optimization.
 @opindex ftrapv
 This option generates traps for signed overflow on addition, subtraction,
 multiplication operations.
+The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
+@option{-ftrapv} @option{-fwrapv} on the command-line results in
+@option{-fwrapv} being effective.  Note that only active options override, so
+using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
+results in @option{-ftrapv} being effective.
 
 @item -fwrapv
 @opindex fwrapv
@@ -23684,6 +23689,11 @@ overflow of addition, subtraction and multiplication wraps around
 using twos-complement representation.  This flag enables some optimizations
 and disables others.  This option is enabled by default for the Java
 front end, as required by the Java language specification.
+The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
+@option{-ftrapv} @option{-fwrapv} on the command-line results in
+@option{-fwrapv} being effective.  Note that only active options override, so
+using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
+results in @option{-ftrapv} being effective.
 
 @item -fexceptions
 @opindex fexceptions
-- 
1.9.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] Don't allow unsafe reductions in graphite
  2015-07-22 11:18           ` Richard Biener
@ 2015-07-22 16:04             ` Tom de Vries
  2015-07-23 10:51               ` Richard Biener
  2015-07-22 16:38             ` [PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions Tom de Vries
  2015-07-24 11:54             ` [PATCH] Add FIXED_POINT_TYPE_OVERFLOW_WRAPS_P Tom de Vries
  2 siblings, 1 reply; 27+ messages in thread
From: Tom de Vries @ 2015-07-22 16:04 UTC (permalink / raw)
  To: Richard Biener, Sebastian Pop; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3025 bytes --]

[ was: Re: [RFC, PR66873] Use graphite for parloops ]

On 22/07/15 13:02, Richard Biener wrote:
> On Wed, Jul 22, 2015 at 1:01 PM, Richard Biener
> <richard.guenther@gmail.com>  wrote:
>> >On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop<sebpop@gmail.com>  wrote:
>>> >>Tom de Vries wrote:
>>>> >>>Fix reduction safety checks
>>>> >>>
>>>> >>>       * graphite-sese-to-poly.c (is_reduction_operation_p): Limit
>>>> >>>       flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
>>>> >>>       TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
>>>> >>>       Only allow wrapping fixed-point otherwise.
>>>> >>>       (build_poly_scop): Always call
>>>> >>>       rewrite_commutative_reductions_out_of_ssa.
>>> >>
>>> >>The changes to graphite look good to me.
>> >
>> >+  if (SCALAR_FLOAT_TYPE_P (type))
>> >+    return flag_associative_math;
>> >+
>> >
>> >why only scalar floats?

Copied from the conditions in vect_is_simple_reduction_1.

 >> >Please use FLOAT_TYPE_P.

Done.

>> >
>> >+  if (INTEGRAL_TYPE_P (type))
>> >+    return (!TYPE_OVERFLOW_TRAPS (type)
>> >+           && TYPE_OVERFLOW_WRAPS (type));
>> >
>> >it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.
>> >

Done.

>> >I'm sure you'll disable quite some parallelization this way... (the
>> >routine is modeled after
>> >the vectorizers IIRC, so it would be affected as well).  Yeah - I see
>> >you modify autopar
>> >testcases.

I now split up the patch, this bit only relates to graphite, so no 
autopar testcases are affected.

>> >Please instead XFAIL the existing ones and add variants
>> >with unsigned
>> >reductions.  Adding -fwrapv isn't a good solution either.

Done.

>> >
>> >Can you think of a testcase that breaks btw?
>> >

If you mean a testcase that fails to execute properly with the fix, and 
executes correctly with the fix, then no.  The problem this patch is 
trying to fix, is that we assume wrapping overflow without fwrapv. In 
order to run into a runtime failure, we need a target that does not do 
wrapping overflow without fwrapv.

>> >The "proper" solution (see other passes) is to rewrite the reduction
>> >to a wrapping
>> >one (cast to unsigned for the reduction op).
>> >

Right.

>> >+  return (FIXED_POINT_TYPE_P (type)
>> >+         && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));
>> >
>> >why?

Again, copied from the conditions in vect_is_simple_reduction_1.

 >> >  Simply return false here instead?

Done.


[ Btw, looking at associative_tree_code, I realized that the
   overflow checking is only necessary for PLUS_EXPR and MULT_EXPR:
...
   switch (code)
     {
     case BIT_IOR_EXPR:
     case BIT_AND_EXPR:
     case BIT_XOR_EXPR:
     case PLUS_EXPR:
     case MULT_EXPR:
     case MIN_EXPR:
     case MAX_EXPR:
       return true;
...

The other operators cannot overflow to begin with. My guess is that it's 
better to leave this for a trunk-only follow-up patch.
]

Currently bootstrapping and reg-testing on x86_64.

OK for trunk?

OK 5 and 4.9 release branches?

Thanks,
- Tom


[-- Attachment #2: 0001-Don-t-allow-unsafe-reductions-in-graphite.patch --]
[-- Type: text/x-patch, Size: 12899 bytes --]

Don't allow unsafe reductions in graphite

2015-07-21  Tom de Vries  <tom@codesourcery.com>

	* graphite-sese-to-poly.c (is_reduction_operation_p): Limit
	flag_associative_math to FLOAT_TYPE_P.  Honour
	TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P. Don't allow any other types.

	* gcc.dg/graphite/block-1.c: Xfail scan.
	* gcc.dg/graphite/interchange-12.c: Same.
	* gcc.dg/graphite/interchange-14.c: Same.
	* gcc.dg/graphite/interchange-15.c: Same.
	* gcc.dg/graphite/interchange-9.c: Same.
	* gcc.dg/graphite/interchange-mvt.c: Same.
	* gcc.dg/graphite/uns-block-1.c: New test.
	* gcc.dg/graphite/uns-interchange-12.c: New test.
	* gcc.dg/graphite/uns-interchange-14.c: New test.
	* gcc.dg/graphite/uns-interchange-15.c: New test.
	* gcc.dg/graphite/uns-interchange-9.c: New test.
	* gcc.dg/graphite/uns-interchange-mvt.c: New test.
---
 gcc/graphite-sese-to-poly.c                        | 14 +++--
 gcc/testsuite/gcc.dg/graphite/block-1.c            |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-12.c     |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-14.c     |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-15.c     |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-9.c      |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-mvt.c    |  2 +-
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c        | 48 +++++++++++++++++
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c | 56 +++++++++++++++++++
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c | 58 ++++++++++++++++++++
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c | 53 ++++++++++++++++++
 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c  | 47 ++++++++++++++++
 .../gcc.dg/graphite/uns-interchange-mvt.c          | 63 ++++++++++++++++++++++
 13 files changed, 342 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-block-1.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 8960c3f..68f7df1 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2604,9 +2604,17 @@ is_reduction_operation_p (gimple stmt)
   gcc_assert (is_gimple_assign (stmt));
   code = gimple_assign_rhs_code (stmt);
 
-  return flag_associative_math
-    && commutative_tree_code (code)
-    && associative_tree_code (code);
+  if (!commutative_tree_code (code)
+      || !associative_tree_code (code))
+    return false;
+
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  if (FLOAT_TYPE_P (type))
+    return flag_associative_math;
+
+  return (INTEGRAL_TYPE_P (type)
+	  && TYPE_OVERFLOW_WRAPS (type));
 }
 
 /* Returns true when PHI contains an argument ARG.  */
diff --git a/gcc/testsuite/gcc.dg/graphite/block-1.c b/gcc/testsuite/gcc.dg/graphite/block-1.c
index a73c20f..2208eb9 100644
--- a/gcc/testsuite/gcc.dg/graphite/block-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/block-1.c
@@ -45,4 +45,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "will be loop blocked" 3 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "will be loop blocked" 3 "graphite" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-12.c b/gcc/testsuite/gcc.dg/graphite/interchange-12.c
index 41a8882..bf95fdd 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-12.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-12.c
@@ -53,4 +53,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-14.c b/gcc/testsuite/gcc.dg/graphite/interchange-14.c
index 36990ab..46f6a6d 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-14.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-14.c
@@ -55,4 +55,4 @@ main (void)
 }
 
 /* PRE destroys the perfect nest and we can't cope with that yet.  */
-/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-15.c b/gcc/testsuite/gcc.dg/graphite/interchange-15.c
index 3ddb74f..9f6b7ae 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-15.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-15.c
@@ -49,5 +49,5 @@ main (void)
 }
 
 /* PRE destroys the perfect nest and we can't cope with that yet.  */
-/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" { xfail *-*-* } } } */
 
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-9.c b/gcc/testsuite/gcc.dg/graphite/interchange-9.c
index cfec110..b023ea8 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-9.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-9.c
@@ -44,4 +44,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-mvt.c b/gcc/testsuite/gcc.dg/graphite/interchange-mvt.c
index 4b8f264..8c00f80 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-mvt.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-mvt.c
@@ -59,5 +59,5 @@ main (void)
 }
 
 /* PRE destroys the perfect nest and we can't cope with that yet.  */
-/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" { xfail *-*-* } } } */
 
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-block-1.c b/gcc/testsuite/gcc.dg/graphite/uns-block-1.c
new file mode 100644
index 0000000..57d522b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/uns-block-1.c
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target size32plus } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define MAX 100
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j;
+  int sum = 0;
+  int A[MAX * MAX];
+  int B[MAX * MAX];
+
+  /* These loops should be loop blocked.  */
+  for (i = 0; i < MAX; i++)
+    for (j = 0; j < MAX; j++)
+      {
+	A[i*MAX + j] = j;
+	B[i*MAX + j] = j;
+      }
+
+  /* These loops should be loop blocked.  */
+  for (i = 0; i < MAX; i++)
+    for (j = 0; j < MAX; j++)
+      A[i*MAX + j] += B[j*MAX + i];
+
+  /* These loops should be loop blocked.  */
+  for (i = 0; i < MAX; i++)
+    for (j = 0; j < MAX; j++)
+      sum += A[i*MAX + j];
+
+#if DEBUG
+  fprintf (stderr, "sum = %d \n", sum);
+#endif
+
+  if (sum != 990000)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "will be loop blocked" 3 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
new file mode 100644
index 0000000..dc26926
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
@@ -0,0 +1,56 @@
+/* { dg-require-effective-target size32plus } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 200
+
+int A[N][N], B[N][N], C[N][N];
+
+static int __attribute__((noinline))
+matmult (void)
+{
+  int i, j, k;
+
+  /* Loops J and K should be interchanged.  */
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+	A[i][j] = 0;
+	for (k = 0; k < N; k++)
+	  A[i][j] += B[i][k] * C[k][j];
+      }
+
+  return A[0][0] + A[N-1][N-1];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+	A[i][j] = 0;
+	B[i][j] = i - j;
+	C[i][j] = i + j;
+      }
+
+  res = matmult ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 2626800)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
new file mode 100644
index 0000000..36990ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target size32plus } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 200
+
+int A[N][N], B[N][N], C[N][N];
+
+static void __attribute__((noinline))
+matmult (void)
+{
+  int i, j, k;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      A[i][j] = 0;
+
+  /* Loops J and K should be interchanged.  */
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      for (k = 0; k < N; k++)
+	A[i][j] += B[i][k] * C[k][j];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res = 0;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+	B[i][j] = j;
+	C[i][j] = i;
+      }
+
+  matmult ();
+
+  for (i = 0; i < N; i++)
+    res += A[i][i];
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 529340000)
+    abort ();
+
+  return 0;
+}
+
+/* PRE destroys the perfect nest and we can't cope with that yet.  */
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
new file mode 100644
index 0000000..3ddb74f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target size32plus } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define NMAX 2000
+
+static int x[NMAX], a[NMAX][NMAX];
+
+static int __attribute__((noinline))
+mvt (long N)
+{
+  int i,j;
+
+  /* These two loops should be interchanged.  */
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      x[i] += a[j][i];
+
+  return x[1];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < NMAX; i++)
+    for (j = 0; j < NMAX; j++)
+      a[i][j] = j;
+
+  for (i = 0; i < NMAX; i++)
+    x[i] = i;
+
+  res = mvt (NMAX);
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 2001)
+    abort ();
+
+  return 0;
+}
+
+/* PRE destroys the perfect nest and we can't cope with that yet.  */
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
+
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
new file mode 100644
index 0000000..cfec110
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target size32plus } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 111
+#define M 111
+
+static int __attribute__((noinline))
+foo (int *x)
+{
+  int i, j;
+  int sum = 0;
+
+  for (j = 0; j < M; ++j)
+    for (i = 0;  i < N; ++i)
+      sum += x[M * i + j];
+
+  return sum;
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int A[N*M];
+  int i, res;
+
+  for (i = 0; i < N*M; i++)
+    A[i] = 2;
+
+  res = foo (A);
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 24642)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c
new file mode 100644
index 0000000..4b8f264
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c
@@ -0,0 +1,63 @@
+/* { dg-require-effective-target size32plus } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define NMAX 2000
+
+static int x1[NMAX], x2[NMAX], a[NMAX][NMAX], y1[NMAX], y2[NMAX];
+
+static int __attribute__((noinline))
+mvt (long N)
+{
+
+  int i,j;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      x1[i] = x1[i] + a[i][j] * y1[j];
+
+  /* These two loops should be interchanged.  */
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      x2[i] = x2[i] + a[j][i] * y2[j];
+
+  return x1[0] + x2[0];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < NMAX; i++)
+    for (j = 0; j < NMAX; j++)
+      a[i][j] = i + j;
+
+  for (i = 0; i < NMAX; i++)
+    {
+      x1[i] = 0;
+      x2[i] = 2*i;
+      y1[i] = 100 - i;
+      y2[i] = i;
+    }
+
+  res = mvt (NMAX);
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 199900000)
+    abort ();
+
+  return 0;
+}
+
+/* PRE destroys the perfect nest and we can't cope with that yet.  */
+/* { dg-final { scan-tree-dump-times "will be interchanged" 1 "graphite" } } */
+
-- 
1.9.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions
  2015-07-22 11:18           ` Richard Biener
  2015-07-22 16:04             ` [PATCH] Don't allow unsafe reductions in graphite Tom de Vries
@ 2015-07-22 16:38             ` Tom de Vries
  2015-07-23 10:54               ` Richard Biener
  2015-07-24 10:43               ` [committed] Remove xfail in autopar/uns-outer-4.c Tom de Vries
  2015-07-24 11:54             ` [PATCH] Add FIXED_POINT_TYPE_OVERFLOW_WRAPS_P Tom de Vries
  2 siblings, 2 replies; 27+ messages in thread
From: Tom de Vries @ 2015-07-22 16:38 UTC (permalink / raw)
  To: Richard Biener, Sebastian Pop; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1621 bytes --]

[ was: Re: [RFC, PR66873] Use graphite for parloops ]

On 22/07/15 13:02, Richard Biener wrote:
> On Wed, Jul 22, 2015 at 1:01 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>>> Tom de Vries wrote:
>>>> Fix reduction safety checks
>>>>

>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 9145dbf..e014be2 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info
>> loop_info, gimple phi,
>>                          "reduction: unsafe fp math optimization: ");
>>         return NULL;
>>       }
>> -  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
>> -          && check_reduction)
>> +  else if (INTEGRAL_TYPE_P (type) && check_reduction)
>>       {
>> ...
>>
>> You didn't need to adjust any testcases?
>>  That's probably because the
>> checking above is
>> not always executed (see PR66623 for a related testcase).  The code
>> needs refactoring.
>> And we need a way-out, that is, we do _not_ want to not vectorize
>> signed reductions.
>> So you need to fix code generation instead.
>
> Btw, for the vectorizer the current "trick" is that nobody takes advantage about
> overflow undefinedness for vector types.
>

AFAIU, you're saying here that there's no current bug related to 
assuming wrapping overflow in the vectorizer?

I've updated the patch accordingly, so we only bother about 
TYPE_OVERFLOW_WRAPS for parloops reductions.

Currently bootstrapping and reg-testing on x86_64.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0003-Check-TYPE_OVERFLOW_WRAPS-for-parloops-reductions.patch --]
[-- Type: text/x-patch, Size: 14920 bytes --]

Check TYPE_OVERFLOW_WRAPS for parloops reductions

2015-07-21  Tom de Vries  <tom@codesourcery.com>

	* tree-parloops.c (gather_scalar_reductions): Add arg to call to
	vect_force_simple_reduction.
	* tree-vect-loop.c (vect_analyze_scalar_cycles_1): Same.
	(vect_is_simple_reduction_1): Add and handle
	need_wrapping_integral_overflow parameter.
	(vect_is_simple_reduction, vect_force_simple_reduction): Add and pass
	need_wrapping_integral_overflow parameter.
	(vectorizable_reduction): Add arg to call to vect_is_simple_reduction.
	* tree-vectorizer.h (vect_force_simple_reduction): Add parameter to decl.

	* gcc.dg/autopar/outer-4.c: Add xfail.
	* gcc.dg/autopar/outer-5.c: Same.
	* gcc.dg/autopar/outer-6.c: Same.
	* gcc.dg/autopar/reduc-2.c: Same.
	* gcc.dg/autopar/reduc-2char.c: Same.
	* gcc.dg/autopar/reduc-2short.c: Same.
	* gcc.dg/autopar/reduc-8.c: Same.
	* gcc.dg/autopar/uns-outer-4.c: New test.
	* gcc.dg/autopar/uns-outer-5.c: New test.
	* gcc.dg/autopar/uns-outer-6.c: New test.
---
 gcc/testsuite/gcc.dg/autopar/outer-4.c      |  2 +-
 gcc/testsuite/gcc.dg/autopar/outer-5.c      |  2 +-
 gcc/testsuite/gcc.dg/autopar/outer-6.c      |  4 +--
 gcc/testsuite/gcc.dg/autopar/reduc-2.c      |  4 +--
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c  |  4 +--
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c |  4 +--
 gcc/testsuite/gcc.dg/autopar/reduc-8.c      |  4 +--
 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c  | 36 ++++++++++++++++++++
 gcc/testsuite/gcc.dg/autopar/uns-outer-5.c  | 49 +++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/autopar/uns-outer-6.c  | 51 +++++++++++++++++++++++++++++
 gcc/tree-parloops.c                         |  6 ++--
 gcc/tree-vect-loop.c                        | 44 +++++++++++++++++--------
 gcc/tree-vectorizer.h                       |  3 +-
 13 files changed, 183 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
 create mode 100644 gcc/testsuite/gcc.dg/autopar/uns-outer-5.c
 create mode 100644 gcc/testsuite/gcc.dg/autopar/uns-outer-6.c

diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c
index 6fd37c5..2027499 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c
@@ -32,4 +32,4 @@ int main(void)
 
 
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-5.c b/gcc/testsuite/gcc.dg/autopar/outer-5.c
index 6a0ae91..d6e0dd3 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-5.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-5.c
@@ -45,4 +45,4 @@ int main(void)
 }
 
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-6.c b/gcc/testsuite/gcc.dg/autopar/outer-6.c
index 6bef7cc..726794c 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-6.c
@@ -44,6 +44,6 @@ int main(void)
 
 
 /* Check that outer loop is parallelized.  */
-/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
 /* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2.c b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
index 3ad16e4..2f4883d 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2.c
@@ -63,6 +63,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops" { xfail *-*-* } } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 072489f..14867f3 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -60,7 +60,7 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 4dbbc8a..7c19cc5 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -59,6 +59,6 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
 
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 16fb954..1d05c48 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -84,5 +84,5 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
new file mode 100644
index 0000000..ef9fc2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+
+void abort (void);
+
+unsigned int g_sum=0;
+unsigned int x[500][500];
+
+void __attribute__((noinline))
+parloop (int N)
+{
+  int i, j;
+  unsigned int sum;
+
+  /* Double reduction is currently not supported, outer loop is not
+     parallelized.  Inner reduction is detected, inner loop is
+     parallelized.  */
+  sum = 0;
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      sum += x[i][j];
+
+  g_sum = sum;
+}
+
+int
+main (void)
+{
+  parloop (500);
+
+  return 0;
+}
+
+
+/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-5.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-5.c
new file mode 100644
index 0000000..a929e5d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-5.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+
+void abort (void);
+
+unsigned int x[500][500];
+unsigned int y[500];
+unsigned int g_sum=0;
+
+void __attribute__((noinline))
+init (int i, int j)
+{
+  x[i][j]=1;
+}
+
+void __attribute__((noinline))
+parloop (int N)
+{
+  int i, j;
+  unsigned int sum;
+
+  /* Inner cycle is currently not supported, outer loop is not
+     parallelized.  Inner reduction is detected, inner loop is
+     parallelized.  */
+  for (i = 0; i < N; i++)
+    {
+      sum = 0;
+      for (j = 0; j < N; j++)
+	sum += x[i][j];
+      y[i]=sum;
+    }
+  g_sum = sum;
+}
+
+int
+main (void)
+{
+  int i, j;
+  for (i = 0; i < 500; i++)
+    for (j = 0; j < 500; j++)
+      init (i, j);
+
+  parloop (500);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
new file mode 100644
index 0000000..5c745f8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-6.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
+
+void abort (void);
+
+unsigned int x[500][500];
+unsigned int y[500];
+unsigned int g_sum=0;
+
+
+void __attribute__((noinline))
+init (int i, int j)
+{
+  x[i][j]=1;
+}
+
+void __attribute__((noinline))
+parloop (int N)
+{
+  int i, j;
+  unsigned int sum;
+
+  /* Outer loop reduction, outerloop is parallelized.  */
+  sum=0;
+  for (i = 0; i < N; i++)
+    {
+      for (j = 0; j < N; j++)
+	y[i]=x[i][j];
+      sum += y[i];
+    }
+  g_sum = sum;
+}
+
+int
+main (void)
+{
+  int i, j;
+  for (i = 0; i < 500; i++)
+    for (j = 0; j < 500; j++)
+      init (i, j);
+
+  parloop (500);
+
+  return 0;
+}
+
+
+/* Check that outer loop is parallelized.  */
+/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index ec41834..88f22e8 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2376,9 +2376,9 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list
       if (!simple_iv (loop, loop, res, &iv, true)
 	&& simple_loop_info)
 	{
-           gimple reduc_stmt = vect_force_simple_reduction (simple_loop_info,
-							    phi, true,
-							    &double_reduc);
+	   gimple reduc_stmt
+	     = vect_force_simple_reduction (simple_loop_info, phi, true,
+					    &double_reduc, true);
 	   if (reduc_stmt && !double_reduc)
               build_new_reduction (reduction_list, reduc_stmt, phi);
         }
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 9145dbf..c31bfbd 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -715,7 +715,7 @@ vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, struct loop *loop)
 
       nested_cycle = (loop != LOOP_VINFO_LOOP (loop_vinfo));
       reduc_stmt = vect_force_simple_reduction (loop_vinfo, phi, !nested_cycle,
-						&double_reduc);
+						&double_reduc, false);
       if (reduc_stmt)
         {
           if (double_reduc)
@@ -2339,7 +2339,7 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple phi, gimple first_stmt)
 static gimple
 vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			    bool check_reduction, bool *double_reduc,
-			    bool modify)
+			    bool modify, bool need_wrapping_integral_overflow)
 {
   struct loop *loop = (gimple_bb (phi))->loop_father;
   struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
@@ -2613,14 +2613,26 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			"reduction: unsafe fp math optimization: ");
       return NULL;
     }
-  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
-	   && check_reduction)
+  else if (INTEGRAL_TYPE_P (type) && check_reduction)
     {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe int math optimization: ");
-      return NULL;
+      if (TYPE_OVERFLOW_TRAPS (type))
+	{
+	  /* Changing the order of operations changes the semantics.  */
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: unsafe int math optimization"
+			    " (overflow traps): ");
+	  return NULL;
+	}
+      if (need_wrapping_integral_overflow && !TYPE_OVERFLOW_WRAPS (type))
+	{
+	  /* Changing the order of operations changes the semantics.  */
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: unsafe int math optimization"
+			    " (overflow doesn't wrap): ");
+	  return NULL;
+	}
     }
   else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
     {
@@ -2749,10 +2761,12 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 
 static gimple
 vect_is_simple_reduction (loop_vec_info loop_info, gimple phi,
-                          bool check_reduction, bool *double_reduc)
+			  bool check_reduction, bool *double_reduc,
+			  bool need_wrapping_integral_overflow)
 {
   return vect_is_simple_reduction_1 (loop_info, phi, check_reduction,
-				     double_reduc, false);
+				     double_reduc, false,
+				     need_wrapping_integral_overflow);
 }
 
 /* Wrapper around vect_is_simple_reduction_1, which will modify code
@@ -2761,10 +2775,12 @@ vect_is_simple_reduction (loop_vec_info loop_info, gimple phi,
 
 gimple
 vect_force_simple_reduction (loop_vec_info loop_info, gimple phi,
-                          bool check_reduction, bool *double_reduc)
+			     bool check_reduction, bool *double_reduc,
+			     bool need_wrapping_integral_overflow)
 {
   return vect_is_simple_reduction_1 (loop_info, phi, check_reduction,
-				     double_reduc, true);
+				     double_reduc, true,
+				     need_wrapping_integral_overflow);
 }
 
 /* Calculate cost of peeling the loop PEEL_ITERS_PROLOGUE times.  */
@@ -5074,7 +5090,7 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
     }
 
   gimple tmp = vect_is_simple_reduction (loop_vinfo, reduc_def_stmt,
-					 !nested_cycle, &dummy);
+					 !nested_cycle, &dummy, false);
   if (orig_stmt)
     gcc_assert (tmp == orig_stmt
 		|| GROUP_FIRST_ELEMENT (vinfo_for_stmt (tmp)) == orig_stmt);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 48c1f8d..dfa8795 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1090,7 +1090,8 @@ extern tree vect_create_addr_base_for_vector_ref (gimple, gimple_seq *,
 /* In tree-vect-loop.c.  */
 /* FORNOW: Used in tree-parloops.c.  */
 extern void destroy_loop_vec_info (loop_vec_info, bool);
-extern gimple vect_force_simple_reduction (loop_vec_info, gimple, bool, bool *);
+extern gimple vect_force_simple_reduction (loop_vec_info, gimple, bool, bool *,
+					   bool);
 /* Drive for loop analysis stage.  */
 extern loop_vec_info vect_analyze_loop (struct loop *);
 /* Drive for loop transformation stage.  */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Document ftrapv/fwrapv interaction
  2015-07-22 15:33           ` [PATCH] Document ftrapv/fwrapv interaction Tom de Vries
@ 2015-07-23 10:39             ` Richard Biener
  2015-07-23 10:42               ` Richard Biener
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Biener @ 2015-07-23 10:39 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches

On Wed, Jul 22, 2015 at 5:11 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> [ Re: [RFC, PR66873] Use graphite for parloops ]
> On 22/07/15 13:01, Richard Biener wrote:
>>
>> why only scalar floats?  Please use FLOAT_TYPE_P.
>>
>> +  if (INTEGRAL_TYPE_P (type))
>> +    return (!TYPE_OVERFLOW_TRAPS (type)
>> +           && TYPE_OVERFLOW_WRAPS (type));
>>
>> it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.
>
>
> Hmm, indeed, when specifying both, one is quietly ignored. The documentation
> also doesn't mention this.
>
> Attached untested patch mentions this ftrapv/fwrapv interaction in the docs.
>
> OK for trunk, if bootstrap succeeds?

Ok.

Richard.

> Thanks,
> - Tom
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Document ftrapv/fwrapv interaction
  2015-07-23 10:39             ` Richard Biener
@ 2015-07-23 10:42               ` Richard Biener
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Biener @ 2015-07-23 10:42 UTC (permalink / raw)
  To: Tom de Vries, Ian Lance Taylor; +Cc: gcc-patches

On Thu, Jul 23, 2015 at 12:19 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Jul 22, 2015 at 5:11 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> [ Re: [RFC, PR66873] Use graphite for parloops ]
>> On 22/07/15 13:01, Richard Biener wrote:
>>>
>>> why only scalar floats?  Please use FLOAT_TYPE_P.
>>>
>>> +  if (INTEGRAL_TYPE_P (type))
>>> +    return (!TYPE_OVERFLOW_TRAPS (type)
>>> +           && TYPE_OVERFLOW_WRAPS (type));
>>>
>>> it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.
>>
>>
>> Hmm, indeed, when specifying both, one is quietly ignored. The documentation
>> also doesn't mention this.
>>
>> Attached untested patch mentions this ftrapv/fwrapv interaction in the docs.
>>
>> OK for trunk, if bootstrap succeeds?
>
> Ok.

Btw, for consistency we probably should add

-fsigned-overflow=traps|wraps|undefined

and make -ftrapv and -fwrapv alias to the respective behavior.

Oh, and -fstrict-overflow is another beast with rather unspecified
behavior... while it's positive form could be aliased to
-fsinged-overflow=undefined it's negative form is _not_ equal
to -fwrapv - it's a third state that says overflow is neither known
to wrap nor undefined (thus it allows even less optimizations).
Note that the behavior of -fno-strict-overflow isn't documented
(only it's postiive form is).  This means that at -O[10] where
-fno-strict-overflow is in effect we are in "undefined" territory.

Maybe it's time to fix that ...

Richard.

> Richard.
>
>> Thanks,
>> - Tom
>>
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Don't allow unsafe reductions in graphite
  2015-07-22 16:04             ` [PATCH] Don't allow unsafe reductions in graphite Tom de Vries
@ 2015-07-23 10:51               ` Richard Biener
  2015-07-24 20:37                 ` Sebastian Pop
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Biener @ 2015-07-23 10:51 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Sebastian Pop, gcc-patches

On Wed, Jul 22, 2015 at 6:00 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> [ was: Re: [RFC, PR66873] Use graphite for parloops ]
>
> On 22/07/15 13:02, Richard Biener wrote:
>>
>> On Wed, Jul 22, 2015 at 1:01 PM, Richard Biener
>> <richard.guenther@gmail.com>  wrote:
>>>
>>> >On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop<sebpop@gmail.com>  wrote:
>>>>
>>>> >>Tom de Vries wrote:
>>>>>
>>>>> >>>Fix reduction safety checks
>>>>> >>>
>>>>> >>>       * graphite-sese-to-poly.c (is_reduction_operation_p): Limit
>>>>> >>>       flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
>>>>> >>>       TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for
>>>>> >>> INTEGRAL_TYPE_P.
>>>>> >>>       Only allow wrapping fixed-point otherwise.
>>>>> >>>       (build_poly_scop): Always call
>>>>> >>>       rewrite_commutative_reductions_out_of_ssa.
>>>>
>>>> >>
>>>> >>The changes to graphite look good to me.
>>>
>>> >
>>> >+  if (SCALAR_FLOAT_TYPE_P (type))
>>> >+    return flag_associative_math;
>>> >+
>>> >
>>> >why only scalar floats?
>
>
> Copied from the conditions in vect_is_simple_reduction_1.
>
>>> >Please use FLOAT_TYPE_P.
>
> Done.
>
>>> >
>>> >+  if (INTEGRAL_TYPE_P (type))
>>> >+    return (!TYPE_OVERFLOW_TRAPS (type)
>>> >+           && TYPE_OVERFLOW_WRAPS (type));
>>> >
>>> >it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.
>>> >
>
>
> Done.
>
>>> >I'm sure you'll disable quite some parallelization this way... (the
>>> >routine is modeled after
>>> >the vectorizers IIRC, so it would be affected as well).  Yeah - I see
>>> >you modify autopar
>>> >testcases.
>
>
> I now split up the patch, this bit only relates to graphite, so no autopar
> testcases are affected.
>
>>> >Please instead XFAIL the existing ones and add variants
>>> >with unsigned
>>> >reductions.  Adding -fwrapv isn't a good solution either.
>
>
> Done.
>
>>> >
>>> >Can you think of a testcase that breaks btw?
>>> >
>
>
> If you mean a testcase that fails to execute properly with the fix, and
> executes correctly with the fix, then no.  The problem this patch is trying
> to fix, is that we assume wrapping overflow without fwrapv. In order to run
> into a runtime failure, we need a target that does not do wrapping overflow
> without fwrapv.
>
>>> >The "proper" solution (see other passes) is to rewrite the reduction
>>> >to a wrapping
>>> >one (cast to unsigned for the reduction op).
>>> >
>
>
> Right.
>
>>> >+  return (FIXED_POINT_TYPE_P (type)
>>> >+         && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));
>>> >
>>> >why?
>
>
> Again, copied from the conditions in vect_is_simple_reduction_1.
>
>>> >  Simply return false here instead?
>
> Done.
>
>
> [ Btw, looking at associative_tree_code, I realized that the
>   overflow checking is only necessary for PLUS_EXPR and MULT_EXPR:
> ...
>   switch (code)
>     {
>     case BIT_IOR_EXPR:
>     case BIT_AND_EXPR:
>     case BIT_XOR_EXPR:
>     case PLUS_EXPR:
>     case MULT_EXPR:
>     case MIN_EXPR:
>     case MAX_EXPR:
>       return true;
> ...
>
> The other operators cannot overflow to begin with. My guess is that it's
> better to leave this for a trunk-only follow-up patch.
> ]
>
> Currently bootstrapping and reg-testing on x86_64.
>
> OK for trunk?
>
> OK 5 and 4.9 release branches?

Ok if Sebastian is fine with it.

Richard.

> Thanks,
> - Tom
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions
  2015-07-22 16:38             ` [PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions Tom de Vries
@ 2015-07-23 10:54               ` Richard Biener
  2015-07-24 10:43               ` [committed] Remove xfail in autopar/uns-outer-4.c Tom de Vries
  1 sibling, 0 replies; 27+ messages in thread
From: Richard Biener @ 2015-07-23 10:54 UTC (permalink / raw)
  To: Tom de Vries, Joseph S. Myers; +Cc: Sebastian Pop, gcc-patches

On Wed, Jul 22, 2015 at 6:13 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> [ was: Re: [RFC, PR66873] Use graphite for parloops ]
>
> On 22/07/15 13:02, Richard Biener wrote:
>>
>> On Wed, Jul 22, 2015 at 1:01 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>
>>> On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>>>>
>>>> Tom de Vries wrote:
>>>>>
>>>>> Fix reduction safety checks
>>>>>
>
>>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>> index 9145dbf..e014be2 100644
>>> --- a/gcc/tree-vect-loop.c
>>> +++ b/gcc/tree-vect-loop.c
>>> @@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info
>>> loop_info, gimple phi,
>>>                          "reduction: unsafe fp math optimization: ");
>>>         return NULL;
>>>       }
>>> -  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
>>> -          && check_reduction)
>>> +  else if (INTEGRAL_TYPE_P (type) && check_reduction)
>>>       {
>>> ...
>>>
>>> You didn't need to adjust any testcases?
>>>  That's probably because the
>>> checking above is
>>> not always executed (see PR66623 for a related testcase).  The code
>>> needs refactoring.
>>> And we need a way-out, that is, we do _not_ want to not vectorize
>>> signed reductions.
>>> So you need to fix code generation instead.
>>
>>
>> Btw, for the vectorizer the current "trick" is that nobody takes advantage
>> about
>> overflow undefinedness for vector types.
>>
>
> AFAIU, you're saying here that there's no current bug related to assuming
> wrapping overflow in the vectorizer?

Well - TYPE_OVERFLOW_UNDEFINED will happily return true for
vector integer types but nothing I know will exploit that (bogus) knowledge.

And I'd rather change the reporting of TYPE_OVERFLOW_UNDEFINED here
as the C standard doesn't have vector types and the middle-end cannot
distinguish
user-written code (via intrinsics now using the generic vector GCC
language extension)
from compiler-generated code.

Similar for _Complex integer types (also a GCC extension?).

> I've updated the patch accordingly, so we only bother about
> TYPE_OVERFLOW_WRAPS for parloops reductions.
>
> Currently bootstrapping and reg-testing on x86_64.
>
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [committed] Remove xfail in autopar/uns-outer-4.c
  2015-07-22 16:38             ` [PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions Tom de Vries
  2015-07-23 10:54               ` Richard Biener
@ 2015-07-24 10:43               ` Tom de Vries
  1 sibling, 0 replies; 27+ messages in thread
From: Tom de Vries @ 2015-07-24 10:43 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2550 bytes --]

[ was: Re: [PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions ]

On 22/07/15 18:13, Tom de Vries wrote:
> 0003-Check-TYPE_OVERFLOW_WRAPS-for-parloops-reductions.patch
>
>
> Check TYPE_OVERFLOW_WRAPS for parloops reductions
>
> 2015-07-21  Tom de Vries<tom@codesourcery.com>
>
> 	* tree-parloops.c (gather_scalar_reductions): Add arg to call to
> 	vect_force_simple_reduction.
> 	* tree-vect-loop.c (vect_analyze_scalar_cycles_1): Same.
> 	(vect_is_simple_reduction_1): Add and handle
> 	need_wrapping_integral_overflow parameter.
> 	(vect_is_simple_reduction, vect_force_simple_reduction): Add and pass
> 	need_wrapping_integral_overflow parameter.
> 	(vectorizable_reduction): Add arg to call to vect_is_simple_reduction.
> 	* tree-vectorizer.h (vect_force_simple_reduction): Add parameter to decl.
>
> 	* gcc.dg/autopar/outer-4.c: Add xfail.
> 	* gcc.dg/autopar/outer-5.c: Same.
> 	* gcc.dg/autopar/outer-6.c: Same.
> 	* gcc.dg/autopar/reduc-2.c: Same.
> 	* gcc.dg/autopar/reduc-2char.c: Same.
> 	* gcc.dg/autopar/reduc-2short.c: Same.
> 	* gcc.dg/autopar/reduc-8.c: Same.
> 	* gcc.dg/autopar/uns-outer-4.c: New test.
> 	* gcc.dg/autopar/uns-outer-5.c: New test.
> 	* gcc.dg/autopar/uns-outer-6.c: New test.

> diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
> new file mode 100644
> index 0000000..ef9fc2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-optimized" } */
> +
> +void abort (void);
> +
> +unsigned int g_sum=0;
> +unsigned int x[500][500];
> +
> +void __attribute__((noinline))
> +parloop (int N)
> +{
> +  int i, j;
> +  unsigned int sum;
> +
> +  /* Double reduction is currently not supported, outer loop is not
> +     parallelized.  Inner reduction is detected, inner loop is
> +     parallelized.  */
> +  sum = 0;
> +  for (i = 0; i < N; i++)
> +    for (j = 0; j < N; j++)
> +      sum += x[i][j];
> +
> +  g_sum = sum;
> +}
> +
> +int
> +main (void)
> +{
> +  parloop (500);
> +
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */

We currently get an XPASS for the last xfail.

The inner loop is paralllelized, so the split-off function loopfn exists.

Committed this follow-up patch to trunk, to remove the incorrect xfail.

Thanks,
- Tom


[-- Attachment #2: 0001-Remove-xfail-in-autopar-uns-outer-4.c.patch --]
[-- Type: text/x-patch, Size: 773 bytes --]

Remove xfail in autopar/uns-outer-4.c

2015-07-24  Tom de Vries  <tom@codesourcery.com>

	* gcc.dg/autopar/uns-outer-4.c: Remove loopfn xfail.
---
 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
index ef9fc2a..8365a89 100644
--- a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
@@ -33,4 +33,4 @@ main (void)
 
 
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
-- 
1.9.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] Add FIXED_POINT_TYPE_OVERFLOW_WRAPS_P
  2015-07-22 11:18           ` Richard Biener
  2015-07-22 16:04             ` [PATCH] Don't allow unsafe reductions in graphite Tom de Vries
  2015-07-22 16:38             ` [PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions Tom de Vries
@ 2015-07-24 11:54             ` Tom de Vries
  2 siblings, 0 replies; 27+ messages in thread
From: Tom de Vries @ 2015-07-24 11:54 UTC (permalink / raw)
  Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

[ was: [RFC, PR66873] Use graphite for parloops ]

On 22/07/15 13:02, Richard Biener wrote:
>>+#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
>>+  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))
>>
>>somebody with knowledge about fixed-point types needs to review this.

In vect_is_simple_reduction_1 I noticed:
...
   else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
     {
       /* Changing the order of operations changes the semantics.  */
       if (dump_enabled_p ())
         report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
                         "reduction: unsafe fixed-point math 
optimization: ");
       return NULL;
     }
...
In other words, you can change evaluation order if !SAT_FIXED_POINT_TYPE_P.

It is true that for saturating fixed point, you don't want to change 
evaluation order.

But I think that we actually want to test whether the fixed-point type 
wraps.

I tried to find proof that non-saturating fixed point wraps, but that 
doesn't seem trivial. IMHO, non-trivial enough to define a macro 
FIXED_POINT_TYPE_OVERFLOW_WRAPS_P, add a lengthy comment and use that 
instead of !SAT_FIXED_POINT_TYPE_P.

The intention of the patch is that it doesn't change behaviour of the 
compiler. Currently bootstrapping and reg-testing on x86_64.

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 0003-Add-FIXED_POINT_TYPE_OVERFLOW_WRAPS_P.patch --]
[-- Type: text/x-patch, Size: 3511 bytes --]

Add FIXED_POINT_TYPE_OVERFLOW_WRAPS_P

2015-07-24  Tom de Vries  <tom@codesourcery.com>

	* tree.h (FIXED_POINT_TYPE_OVERFLOW_WRAPS_P): Define.
	* fold-const.c (split_tree): Use FIXED_POINT_TYPE_OVERFLOW_WRAPS_P.
	* tree-ssa-reassoc.c (can_reassociate_p): Same.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Same.
---
 gcc/fold-const.c       |  3 ++-
 gcc/tree-ssa-reassoc.c |  2 +-
 gcc/tree-vect-loop.c   |  4 +++-
 gcc/tree.h             | 18 ++++++++++++++++++
 4 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 886922f..2de71bb 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -808,7 +808,8 @@ split_tree (tree in, enum tree_code code, tree *conp, tree *litp,
     *litp = in;
   else if (TREE_CODE (in) == code
 	   || ((! FLOAT_TYPE_P (TREE_TYPE (in)) || flag_associative_math)
-	       && ! SAT_FIXED_POINT_TYPE_P (TREE_TYPE (in))
+	       && (!FIXED_POINT_TYPE_P (TREE_TYPE (in))
+		   || FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (TREE_TYPE (in)))
 	       /* We can associate addition and subtraction together (even
 		  though the C standard doesn't say so) for integers because
 		  the value is not affected.  For reals, the value might be
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index efb813c..2851a13 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4229,7 +4229,7 @@ can_reassociate_p (tree op)
 {
   tree type = TREE_TYPE (op);
   if ((INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type))
-      || NON_SAT_FIXED_POINT_TYPE_P (type)
+      || FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type)
       || (flag_associative_math && FLOAT_TYPE_P (type)))
     return true;
   return false;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 42ba5f8..0e61a02 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2635,7 +2635,9 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 	  return NULL;
 	}
     }
-  else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
+  else if (FIXED_POINT_TYPE_P (type) &&
+	   !FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type)
+	   && check_reduction)
     {
       /* Changing the order of operations changes the semantics.  */
       if (dump_enabled_p ())
diff --git a/gcc/tree.h b/gcc/tree.h
index 360d13e..ab0e537 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -497,6 +497,24 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 #define SAT_FIXED_POINT_TYPE_P(TYPE) \
   (TREE_CODE (TYPE) == FIXED_POINT_TYPE && TYPE_SATURATING (TYPE))
 
+/* Nonzero if fixed-point type TYPE wraps at overflow.
+
+   Fixed-point types that are explictly saturating do not wrap at overflow.
+
+   The draft technical report (N1169 draft of ISO/IEC DTR 18037) specifies
+   pragmas to control overflow for a fixed-point type that is not explictly
+   saturating (FX_FRACT_OVERFLOW and FX_ACCUM_OVERFLOW).  The possible states
+   of the pragmas are SAT and DEFAULT.  The default state for the pragmas is
+   DEFAULT, which means overflow has undefined behaviour.  GCC currently does
+   not support these pragmas.
+
+   The de-facto choice of GCC for fixed-point types that are not explictly
+   saturating seems to be modular wrap-around (as specified in Annex E.4 Modwrap
+   overflow).  */
+
+#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
+  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))
+
 /* Nonzero if TYPE represents a fixed-point type.  */
 
 #define FIXED_POINT_TYPE_P(TYPE)	(TREE_CODE (TYPE) == FIXED_POINT_TYPE)
-- 
1.9.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Don't allow unsafe reductions in graphite
  2015-07-23 10:51               ` Richard Biener
@ 2015-07-24 20:37                 ` Sebastian Pop
  2015-07-25 11:41                   ` Tom de Vries
  0 siblings, 1 reply; 27+ messages in thread
From: Sebastian Pop @ 2015-07-24 20:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tom de Vries, gcc-patches

Richard Biener wrote:
> On Wed, Jul 22, 2015 at 6:00 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> > Currently bootstrapping and reg-testing on x86_64.
> >
> > OK for trunk?
> >
> > OK 5 and 4.9 release branches?
> 
> Ok if Sebastian is fine with it.

Ok to backport as well.
Thanks Tom for the patches.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Don't allow unsafe reductions in graphite
  2015-07-24 20:37                 ` Sebastian Pop
@ 2015-07-25 11:41                   ` Tom de Vries
  0 siblings, 0 replies; 27+ messages in thread
From: Tom de Vries @ 2015-07-25 11:41 UTC (permalink / raw)
  To: Sebastian Pop, Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

On 24/07/15 22:20, Sebastian Pop wrote:
> Richard Biener wrote:
>> On Wed, Jul 22, 2015 at 6:00 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> Currently bootstrapping and reg-testing on x86_64.
>>>
>>> OK for trunk?
>>>
>>> OK 5 and 4.9 release branches?
>>
>> Ok if Sebastian is fine with it.
>
> Ok to backport as well.
> Thanks Tom for the patches.

And thanks for the review.

This follow-up patch:
- makes sure that the uns-*.c variants are handled the same as the
   original ones in graphite.exp.
- actually makes the uns-*.c variants use unsigned arithmetic.

Committed to trunk.

Thanks,
- Tom




[-- Attachment #2: 0001-Fixup-graphite-uns-.c-testcases.patch --]
[-- Type: text/x-patch, Size: 6017 bytes --]

Fixup graphite/uns-*.c testcases

2015-07-25  Tom de Vries  <tom@codesourcery.com>

	* gcc.dg/graphite/graphite.exp: Include uns-*.c files in
	interchange_files and block_files variables.
	* gcc.dg/graphite/uns-block-1.c (main): Change signed into unsigned
	arithmetic.
	* gcc.dg/graphite/uns-interchange-12.c: Same.
	* gcc.dg/graphite/uns-interchange-14.c: Same.
	* gcc.dg/graphite/uns-interchange-15.c: Same.
	* gcc.dg/graphite/uns-interchange-9.c (foo): Same.
	* gcc.dg/graphite/uns-interchange-mvt.c: Same.
---
 gcc/testsuite/gcc.dg/graphite/graphite.exp          |  6 ++++--
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c         |  6 +++---
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c  |  7 ++++---
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c  |  5 +++--
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c  |  7 ++++---
 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c   | 11 ++++++-----
 gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c |  7 ++++---
 7 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/graphite/graphite.exp b/gcc/testsuite/gcc.dg/graphite/graphite.exp
index 9dba5d6..9e7ede6 100644
--- a/gcc/testsuite/gcc.dg/graphite/graphite.exp
+++ b/gcc/testsuite/gcc.dg/graphite/graphite.exp
@@ -41,8 +41,10 @@ set wait_to_run_files [lsort [glob -nocomplain $srcdir/$subdir/*.c ] ]
 set scop_files        [lsort [glob -nocomplain $srcdir/$subdir/scop-*.c ] ]
 set id_files          [lsort [glob -nocomplain $srcdir/$subdir/id-*.c ] ]
 set run_id_files      [lsort [glob -nocomplain $srcdir/$subdir/run-id-*.c ] ]
-set interchange_files [lsort [glob -nocomplain $srcdir/$subdir/interchange-*.c ] ]
-set block_files       [lsort [glob -nocomplain $srcdir/$subdir/block-*.c ] ]
+set interchange_files [lsort [glob -nocomplain $srcdir/$subdir/interchange-*.c \
+			      $srcdir/$subdir/uns-interchange-*.c ] ]
+set block_files       [lsort [glob -nocomplain $srcdir/$subdir/block-*.c \
+			      $srcdir/$subdir/uns-block-*.c ] ]
 set vect_files        [lsort [glob -nocomplain $srcdir/$subdir/vect-*.c ] ]
 
 # Tests to be compiled.
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-block-1.c b/gcc/testsuite/gcc.dg/graphite/uns-block-1.c
index 57d522b..c50b770 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-block-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-block-1.c
@@ -13,9 +13,9 @@ int
 main (void)
 {
   int i, j;
-  int sum = 0;
-  int A[MAX * MAX];
-  int B[MAX * MAX];
+  unsigned int sum = 0;
+  unsigned int A[MAX * MAX];
+  unsigned int B[MAX * MAX];
 
   /* These loops should be loop blocked.  */
   for (i = 0; i < MAX; i++)
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
index dc26926..bd21ba9 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
@@ -7,9 +7,9 @@
 
 #define N 200
 
-int A[N][N], B[N][N], C[N][N];
+unsigned int A[N][N], B[N][N], C[N][N];
 
-static int __attribute__((noinline))
+static unsigned int __attribute__((noinline))
 matmult (void)
 {
   int i, j, k;
@@ -31,7 +31,8 @@ extern void abort ();
 int
 main (void)
 {
-  int i, j, res;
+  int i, j;
+  unsigned int res;
 
   for (i = 0; i < N; i++)
     for (j = 0; j < N; j++)
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
index 36990ab..b1abd13 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
@@ -7,7 +7,7 @@
 
 #define N 200
 
-int A[N][N], B[N][N], C[N][N];
+unsigned int A[N][N], B[N][N], C[N][N];
 
 static void __attribute__((noinline))
 matmult (void)
@@ -30,7 +30,8 @@ extern void abort ();
 int
 main (void)
 {
-  int i, j, res = 0;
+  int i, j;
+  unsigned res = 0;
 
   for (i = 0; i < N; i++)
     for (j = 0; j < N; j++)
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
index 3ddb74f..a5a2e27 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
@@ -7,9 +7,9 @@
 
 #define NMAX 2000
 
-static int x[NMAX], a[NMAX][NMAX];
+static unsigned int x[NMAX], a[NMAX][NMAX];
 
-static int __attribute__((noinline))
+static unsigned int __attribute__((noinline))
 mvt (long N)
 {
   int i,j;
@@ -27,7 +27,8 @@ extern void abort ();
 int
 main (void)
 {
-  int i, j, res;
+  int i, j;
+  unsigned int res;
 
   for (i = 0; i < NMAX; i++)
     for (j = 0; j < NMAX; j++)
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
index cfec110..6bfd3d6 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
@@ -8,11 +8,11 @@
 #define N 111
 #define M 111
 
-static int __attribute__((noinline))
-foo (int *x)
+static unsigned int __attribute__((noinline))
+foo (unsigned int *x)
 {
   int i, j;
-  int sum = 0;
+  unsigned int sum = 0;
 
   for (j = 0; j < M; ++j)
     for (i = 0;  i < N; ++i)
@@ -26,8 +26,9 @@ extern void abort ();
 int
 main (void)
 {
-  int A[N*M];
-  int i, res;
+  unsigned int A[N*M];
+  int i;
+  unsigned int res;
 
   for (i = 0; i < N*M; i++)
     A[i] = 2;
diff --git a/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c b/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c
index 4b8f264..80f6789 100644
--- a/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c
+++ b/gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c
@@ -7,9 +7,9 @@
 
 #define NMAX 2000
 
-static int x1[NMAX], x2[NMAX], a[NMAX][NMAX], y1[NMAX], y2[NMAX];
+static unsigned int x1[NMAX], x2[NMAX], a[NMAX][NMAX], y1[NMAX], y2[NMAX];
 
-static int __attribute__((noinline))
+static unsigned int __attribute__((noinline))
 mvt (long N)
 {
 
@@ -32,7 +32,8 @@ extern void abort ();
 int
 main (void)
 {
-  int i, j, res;
+  int i, j;
+  unsigned int res;
 
   for (i = 0; i < NMAX; i++)
     for (j = 0; j < NMAX; j++)
-- 
1.9.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-16 10:41       ` Richard Biener
@ 2015-07-26 22:54         ` Tom de Vries
  2015-07-27  5:41           ` Sebastian Pop
  0 siblings, 1 reply; 27+ messages in thread
From: Tom de Vries @ 2015-07-26 22:54 UTC (permalink / raw)
  To: Richard Biener, Thomas Schwinge; +Cc: gcc-patches

On 16/07/15 12:28, Richard Biener wrote:
> On Thu, Jul 16, 2015 at 12:23 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Jul 16, 2015 at 12:19 PM, Thomas Schwinge
>> <thomas@codesourcery.com> wrote:
>>> Hi Tom!
>>>
>>> On Thu, 16 Jul 2015 10:46:00 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
>>>> On Wed, Jul 15, 2015 at 10:26 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> I tried to parallelize this fortran test-case (based on autopar/outer-1.c),
>>>>> [...]
>>>
>>>>> So I wondered, why not always use the graphite dependency analysis in
>>>>> parloops. (Of course you could use -floop-parallelize-all, but that also
>>>>> changes the heuristic). So I wrote a patch for parloops to use graphite
>>>>> dependency analysis by default (so without -floop-parallelize-all), but
>>>>> while testing found out that all the reduction test-cases started failing
>>>>> because the modifications graphite makes to the code messes up the parloops
>>>>> reduction analysis.
>>>>>
>>>>> Then I came up with this patch, which:
>>>>> - first runs a parloops pass, restricted to reduction loops only,
>>>>> - then runs graphite dependency analysis
>>>>> - followed by a normal parloops pass run.
>>>>>
>>>>> This way, we get to both:
>>>>> - compile the reduction testcases as before, and
>>>>> - profit from the better graphite dependency analysis otherwise.
>>>
>>>> graphite dependence analysis is too slow to be enabled unconditionally.
>>>> (read: hours in some simple cases - see bugzilla)
>>>
>>> Haha, "cool"!  ;-)
>>>
>>> Maybe it is still reasonable to use graphite to analyze the code inside
>>> OpenACC kernels regions -- maybe such code can reasonably be expected to
>>> not have the properties that make its analysis lengthy?  So, Tom, could
>>> you please identify and check such PRs, to get an understanding of what
>>> these properties are?
>>
>> Like the one in PR62113 or 53852 or 59121.
>
> Btw, it would be nice to handle this case (or at least figure out why we can't)
> in GCCs dependence analysis.
>

I wrote an equivalent test-case in C:
...
$ cat src/gcc/testsuite/gcc.dg/autopar/outer-7.c
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-parallelize-loops=2 
-fdump-tree-parloops-details -fdump-tree-optimized" } */

void abort (void);

#define N 500

int
main (void)
{
   int i, j;
   int x[N][N];
   int *y = &x[0][0];

   for (i = 0; i < N; i++)
     for (j = 0; j < N; j++)
       /* y[i * N + j] == x[i][j].  */
       y[i * N + j] = i + j + 3;

   for (i = 0; i < N; i++)
     for (j = 0; j < N; j++)
       if (x[i][j] != i + j + 3)
	abort ();

   return 0;
}

/* Check that outer loop is parallelized.  */
/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 
"parloops" } } */
/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
...

With -fno-tree-loop-ivcanon to keep original iteration order we get:
...
#(Data Ref:
#  bb: 4
#  stmt: *_15 = _17;
#  ref: *_15;
#  base_object: MEM[(int *)&x];
#  Access function 0: {{0B, +, 2000}_1, +, 4}_4
#)
#(Data Ref:
#  bb: 4
#  stmt: *_15 = _17;
#  ref: *_15;
#  base_object: MEM[(int *)&x];
#  Access function 0: {{0B, +, 2000}_1, +, 4}_4
#)
   access_fn_A: {{0B, +, 2000}_1, +, 4}_4
   access_fn_B: {{0B, +, 2000}_1, +, 4}_4

  (subscript
   iterations_that_access_an_element_twice_in_A: [0]
   last_conflict: scev_not_known
   iterations_that_access_an_element_twice_in_B: [0]
   last_conflict: scev_not_known
   (Subscript distance: 0 ))
   inner loop index: 0
   loop nest: (1 4 )
   distance_vector:   0   0
   distance_vector:   1 -500
   direction_vector:     =    =
   direction_vector:     +    -
)
   FAILED: data dependencies exist across iterations
...

If we replace the y[i * N + j] with x[i][j] we get instead:
...
#(Data Ref:
#  bb: 4
#  stmt: x[i_7][j_8] = _12;
#  ref: x[i_7][j_8];
#  base_object: x;
#  Access function 0: {0, +, 1}_4
#  Access function 1: {0, +, 1}_1
#)
#(Data Ref:
#  bb: 4
#  stmt: x[i_7][j_8] = _12;
#  ref: x[i_7][j_8];
#  base_object: x;
#  Access function 0: {0, +, 1}_4
#  Access function 1: {0, +, 1}_1
#)
   access_fn_A: {0, +, 1}_4
   access_fn_B: {0, +, 1}_4

  (subscript
   iterations_that_access_an_element_twice_in_A: [0]
   last_conflict: scev_not_known
   iterations_that_access_an_element_twice_in_B: [0]
   last_conflict: scev_not_known
   (Subscript distance: 0 ))
   access_fn_A: {0, +, 1}_1
   access_fn_B: {0, +, 1}_1

  (subscript
   iterations_that_access_an_element_twice_in_A: [0]
   last_conflict: scev_not_known
   iterations_that_access_an_element_twice_in_B: [0]
   last_conflict: scev_not_known
   (Subscript distance: 0 ))
   inner loop index: 0
   loop nest: (1 4 )
   distance_vector:   0   0
   direction_vector:     =    =
)
   SUCCESS: may be parallelized
parallelizing outer loop 8
...

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC, PR66873] Use graphite for parloops
  2015-07-26 22:54         ` Tom de Vries
@ 2015-07-27  5:41           ` Sebastian Pop
  0 siblings, 0 replies; 27+ messages in thread
From: Sebastian Pop @ 2015-07-27  5:41 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Richard Biener, Thomas Schwinge, gcc-patches, Tobias Grosser

On Sun, Jul 26, 2015 at 4:21 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> I wrote an equivalent test-case in C:
> ...
> $ cat src/gcc/testsuite/gcc.dg/autopar/outer-7.c
> /* { dg-do compile } */
> /* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details
> -fdump-tree-optimized" } */
>
> void abort (void);
>
> #define N 500
>
> int
> main (void)
> {
>   int i, j;
>   int x[N][N];
>   int *y = &x[0][0];
>
>   for (i = 0; i < N; i++)
>     for (j = 0; j < N; j++)
>       /* y[i * N + j] == x[i][j].  */
>       y[i * N + j] = i + j + 3;
>
>   for (i = 0; i < N; i++)
>     for (j = 0; j < N; j++)
>       if (x[i][j] != i + j + 3)
>         abort ();
>
>   return 0;
> }
>
> /* Check that outer loop is parallelized.  */
> /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops"
> } } */
> /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
> ...
>
> With -fno-tree-loop-ivcanon to keep original iteration order we get:
> ...
> #(Data Ref:
> #  bb: 4
> #  stmt: *_15 = _17;
> #  ref: *_15;
> #  base_object: MEM[(int *)&x];
> #  Access function 0: {{0B, +, 2000}_1, +, 4}_4
> #)
> #(Data Ref:
> #  bb: 4
> #  stmt: *_15 = _17;
> #  ref: *_15;
> #  base_object: MEM[(int *)&x];
> #  Access function 0: {{0B, +, 2000}_1, +, 4}_4
> #)
>   access_fn_A: {{0B, +, 2000}_1, +, 4}_4
>   access_fn_B: {{0B, +, 2000}_1, +, 4}_4
>
>  (subscript
>   iterations_that_access_an_element_twice_in_A: [0]
>   last_conflict: scev_not_known
>   iterations_that_access_an_element_twice_in_B: [0]
>   last_conflict: scev_not_known
>   (Subscript distance: 0 ))
>   inner loop index: 0
>   loop nest: (1 4 )
>   distance_vector:   0   0
>   distance_vector:   1 -500
>   direction_vector:     =    =
>   direction_vector:     +    -
> )
>   FAILED: data dependencies exist across iterations
> ...
>
> If we replace the y[i * N + j] with x[i][j] we get instead:
> ...
> #(Data Ref:
> #  bb: 4
> #  stmt: x[i_7][j_8] = _12;
> #  ref: x[i_7][j_8];
> #  base_object: x;
> #  Access function 0: {0, +, 1}_4
> #  Access function 1: {0, +, 1}_1
> #)
> #(Data Ref:
> #  bb: 4
> #  stmt: x[i_7][j_8] = _12;
> #  ref: x[i_7][j_8];
> #  base_object: x;
> #  Access function 0: {0, +, 1}_4
> #  Access function 1: {0, +, 1}_1
> #)
>   access_fn_A: {0, +, 1}_4
>   access_fn_B: {0, +, 1}_4
>
>  (subscript
>   iterations_that_access_an_element_twice_in_A: [0]
>   last_conflict: scev_not_known
>   iterations_that_access_an_element_twice_in_B: [0]
>   last_conflict: scev_not_known
>   (Subscript distance: 0 ))
>   access_fn_A: {0, +, 1}_1
>   access_fn_B: {0, +, 1}_1
>
>  (subscript
>   iterations_that_access_an_element_twice_in_A: [0]
>   last_conflict: scev_not_known
>   iterations_that_access_an_element_twice_in_B: [0]
>   last_conflict: scev_not_known
>   (Subscript distance: 0 ))
>   inner loop index: 0
>   loop nest: (1 4 )
>   distance_vector:   0   0
>   direction_vector:     =    =
> )
>   SUCCESS: may be parallelized
> parallelizing outer loop 8
> ...

It looks like a delinearization pass could help reconstruct a two
dimension array reference, and make the Banerjee dependence test
succeed.
Note that Graphite works in this case just because the loop bounds are
statically defined: N is 500.  Now if you have N passed in as a
function parameter, Graphite would also fail to compute the
dependence, as it cannot represent "i * N", so we would also need the
delinearization pass for Graphite.
Here is a bug that I recently opened for that:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66981

Sebastian

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-07-27  3:49 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-15 22:18 [RFC, PR66873] Use graphite for parloops Tom de Vries
2015-07-16  8:48 ` Richard Biener
2015-07-16 10:25   ` Thomas Schwinge
2015-07-16 10:28     ` Richard Biener
2015-07-16 10:41       ` Richard Biener
2015-07-26 22:54         ` Tom de Vries
2015-07-27  5:41           ` Sebastian Pop
2015-07-16 11:41       ` Tom de Vries
2015-07-20 18:53         ` Sebastian Pop
2015-07-21  0:22           ` Tom de Vries
2015-07-20 18:54 ` Sebastian Pop
2015-07-21  5:59   ` Tom de Vries
2015-07-21 14:35     ` Tom de Vries
2015-07-21 19:08       ` Sebastian Pop
2015-07-22 11:02         ` Richard Biener
2015-07-22 11:18           ` Richard Biener
2015-07-22 16:04             ` [PATCH] Don't allow unsafe reductions in graphite Tom de Vries
2015-07-23 10:51               ` Richard Biener
2015-07-24 20:37                 ` Sebastian Pop
2015-07-25 11:41                   ` Tom de Vries
2015-07-22 16:38             ` [PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions Tom de Vries
2015-07-23 10:54               ` Richard Biener
2015-07-24 10:43               ` [committed] Remove xfail in autopar/uns-outer-4.c Tom de Vries
2015-07-24 11:54             ` [PATCH] Add FIXED_POINT_TYPE_OVERFLOW_WRAPS_P Tom de Vries
2015-07-22 15:33           ` [PATCH] Document ftrapv/fwrapv interaction Tom de Vries
2015-07-23 10:39             ` Richard Biener
2015-07-23 10:42               ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).