public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* openacc kernels directive -- initial support
@ 2014-11-15 14:08 Tom de Vries
  2014-11-15 17:21 ` [PATCH, 1/8] Expand oacc kernels after pass_build_ealias Tom de Vries
                   ` (12 more replies)
  0 siblings, 13 replies; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 14:08 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

Hi,

I'm submitting a patch series with initial support for the oacc kernels directive.

The patch series uses pass_parallelize_loops to implement parallelization of 
loops in the oacc kernels region.

The patch series consists of these 8 patches:
...
     1  Expand oacc kernels after pass_build_ealias
     2  Add pass_oacc_kernels
     3  Add pass_ch_oacc_kernels to pass_oacc_kernels
     4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
     5  Add pass_loop_im to pass_oacc_kernels
     6  Add pass_ccp to pass_oacc_kernels
     7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
     8  Do simple omp lowering for no address taken var
...

The patch series does not yet apply cleanly to trunk, since it's dependent on 
the oacc middle end changes present in the gomp-4_0-branch, already submitted by 
Thomas for trunk.

Furthermore, it's dependent on an assert fix submitted for trunk ('Fix 
gcc_assert in expand_omp_for_static_chunk' @ 
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01149.html ).

The patch series is intended for trunk, but - given the dependency on the oacc 
middle end changes - has been bootstrapped for x86_64 on top of gomp-4_0-branch.

I'll post the patch series in reply to this email.

Thanks,
- Tom

[ FTR  In order to get clean libgomp and goacc test results in gomp-4_0-branch, 
to have a good basis for testing, I used the following patch set:

  Don't allow flto-partition=balance for fopenacc
    Unsubmitted. This works around a compilation problem for
    libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c that I ran into on
    our internal dev branch.  I'll investigate whether I can reproduce with
    gomp-4_0-branch asap.

  Mark fopenacc as LTO option
    @ https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00085.html	

  Only use nvidia accelerator if present
    @ https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00247.html

  Set default LIBGOMP_PLUGIN_PATH
    @ https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00242.html
]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH, 1/8] Expand oacc kernels after pass_build_ealias
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
@ 2014-11-15 17:21 ` Tom de Vries
  2014-11-24 11:29   ` Tom de Vries
  2014-11-15 17:22 ` [PATCH, 2/8] Add pass_oacc_kernels Tom de Vries
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 17:21 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]

On 15-11-14 13:14, Tom de Vries wrote:
> Hi,
>
> I'm submitting a patch series with initial support for the oacc kernels directive.
>
> The patch series uses pass_parallelize_loops to implement parallelization of
> loops in the oacc kernels region.
>
> The patch series consists of these 8 patches:
> ...
>      1  Expand oacc kernels after pass_build_ealias
>      2  Add pass_oacc_kernels
>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>      5  Add pass_loop_im to pass_oacc_kernels
>      6  Add pass_ccp to pass_oacc_kernels
>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>      8  Do simple omp lowering for no address taken var
> ...

This patch moves omp expansion of the oacc kernels directive to after 
pass_build_ealias.

The rationale is that in order to use pass_parallelize_loops for analysis and 
transformation of an oacc kernels region, we postpone omp expansion of that 
region until the earliest point in the pass list where enough information is 
availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.

The patch postpones expansion in expand_omp, and ensures expansion by adding 
pass_expand_omp_ssa:
- after pass_build_ealias, and
- after pass_all_early_optimizations for the case we're not optimizing.

In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa, 
the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of 
lowered omp code, to handle it conservatively.

The patch contains changes in expand_omp_target to deal with ssa-code, similar 
to what is already present in expand_omp_taskreg.

Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be 
static for oacc kernels. It does this to get some references to .omp_data_sizes 
and .omp_data_kinds in the ssa code.  Without these references, the definitions 
will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not 
enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE 
kludge for this purpose ].

Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the 
original function of which the definition has been removed (as in moved to the 
split off function). TODO_remove_unused_locals takes care of some of them, but 
not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these 
dangling SSA_NAMEs and releases them.

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 0001-Expand-oacc-kernels-after-pass_build_ealias.patch --]
[-- Type: text/x-patch, Size: 15582 bytes --]

2014-11-14  Tom de Vries  <tom@codesourcery.com>

	* function.h (struct function): Add contains_oacc_kernels field.
	* gimplify.c (gimplify_omp_workshare): Set contains_oacc_kernels.
	* omp-low.c: Include gimple-pretty-print.h.
	(release_first_vuse_in_edge_dest): New function.
	(expand_omp_target): Handle ssa-code.
	(expand_omp): Don't expand GIMPLE_OACC_KERNELS when not in ssa.
	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
	properties_provided field.
	(pass_expand_omp::execute): Set PROP_gimple_eomp in
	cfun->curr_properties only if cfun does not contain oacc kernels.
	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
	todo_flags_finish field.
	(pass_expand_omp_ssa::execute): Release dandging SSA_NAMEs after calling
	execute_expand_omp.
	(lower_omp_target): Add static_arrays variable, init to 1.  Don't use
	static arrays for kernels directive.  Use static_arrays variable.
	Handle case that .omp_data_kinds is not static.
	(gimple_stmt_omp_lowering_p): New function.
	* omp-low.h (gimple_stmt_omp_lowering_p): Declare.
	* passes.def: Add pass_expand_omp_ssa after pass_build_ealias.
	* tree-ssa-ccp.c: Include omp-low.h.
	(surely_varying_stmt_p): Handle omp lowering code conservatively.
	* tree-ssa-forwprop.c: Include omp-low.h.
	(pass_forwprop::execute): Handle omp lowering code conservatively.
---
 gcc/function.h          |   3 +
 gcc/gimplify.c          |   1 +
 gcc/omp-low.c           | 194 +++++++++++++++++++++++++++++++++++++++++++++---
 gcc/omp-low.h           |   1 +
 gcc/passes.def          |   2 +
 gcc/tree-ssa-ccp.c      |   4 +
 gcc/tree-ssa-forwprop.c |   4 +-
 7 files changed, 196 insertions(+), 13 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index 08ab761..a72c154 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -664,6 +664,9 @@ struct GTY(()) function {
 
   /* Set when the tail call has been identified.  */
   unsigned int tail_call_marked : 1;
+
+  /* Set when the function contains oacc kernels directives.  */
+  unsigned int contains_oacc_kernels : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2c8c666..52d7e6d 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7281,6 +7281,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
       break;
     case OACC_KERNELS:
       stmt = gimple_build_oacc_kernels (body, OACC_KERNELS_CLAUSES (expr));
+      cfun->contains_oacc_kernels = 1;
       break;
     case OACC_PARALLEL:
       stmt = gimple_build_oacc_parallel (body, OACC_PARALLEL_CLAUSES (expr));
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 187167a..6caeae9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "cilk.h"
 #include "lto-section-names.h"
+#include "gimple-pretty-print.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -5337,6 +5338,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
     }
 }
 
+static void
+release_first_vuse_in_edge_dest (edge e)
+{
+  gimple_stmt_iterator i;
+  basic_block bb = e->dest;
+
+  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
+    {
+      gimple phi = gsi_stmt (i);
+      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+
+      if (!virtual_operand_p (arg))
+	continue;
+
+      mark_virtual_operand_for_renaming (arg);
+      return;
+    }
+
+  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
+    {
+      gimple stmt = gsi_stmt (i);
+      if (gimple_vuse (stmt) == NULL_TREE)
+	continue;
+
+      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
+      return;
+    }
+}
+
 /* Expand the OpenMP parallel or task directive starting at REGION.  */
 
 static void
@@ -8831,7 +8861,6 @@ expand_omp_target (struct omp_region *region)
   /* Supported by expand_omp_taskreg, but not here.  */
   if (child_cfun != NULL)
     gcc_assert (!child_cfun->cfg);
-  gcc_assert (!gimple_in_ssa_p (cfun));
 
   entry_bb = region->entry;
   exit_bb = region->exit;
@@ -8857,7 +8886,7 @@ expand_omp_target (struct omp_region *region)
 	{
 	  basic_block entry_succ_bb = single_succ (entry_bb);
 	  gimple_stmt_iterator gsi;
-	  tree arg;
+	  tree arg, narg;
 	  gimple tgtcopy_stmt = NULL;
 	  tree sender = TREE_VEC_ELT (gimple_omp_data_arg (entry_stmt), 0);
 
@@ -8887,8 +8916,27 @@ expand_omp_target (struct omp_region *region)
 	  gcc_assert (tgtcopy_stmt != NULL);
 	  arg = DECL_ARGUMENTS (child_fn);
 
-	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
-	  gsi_remove (&gsi, true);
+	  if (!gimple_in_ssa_p (cfun))
+	    {
+	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
+	      gsi_remove (&gsi, true);
+	    }
+	  else
+	    {
+	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
+			  == arg);
+
+	      /* If we are in ssa form, we must load the value from the default
+		 definition of the argument.  That should not be defined now,
+		 since the argument is not used uninitialized.  */
+	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
+	      narg = make_ssa_name (arg, gimple_build_nop ());
+	      set_ssa_default_def (cfun, arg, narg);
+	      /* ?? Is setting the subcode really necessary ??  */
+	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
+	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
+	      update_stmt (tgtcopy_stmt);
+	    }
 	}
 
       /* Declare local variables needed in CHILD_CFUN.  */
@@ -8931,11 +8979,23 @@ expand_omp_target (struct omp_region *region)
 	  stmt = gimple_build_return (NULL);
 	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
 	  gsi_remove (&gsi, true);
+
+	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
+	     which is about to be split off.  Mark the vdef for renaming.  */
+	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
 	}
 
       /* Move the offloading region into CHILD_CFUN.  */
 
-      block = gimple_block (entry_stmt);
+      if (gimple_in_ssa_p (cfun))
+	{
+	  init_tree_ssa (child_cfun);
+	  init_ssa_operands (child_cfun);
+	  child_cfun->gimple_df->in_ssa_p = true;
+	  block = NULL_TREE;
+	}
+      else
+	block = gimple_block (entry_stmt);
 
       new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
       if (exit_bb)
@@ -8985,6 +9045,8 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+      if (gimple_in_ssa_p (cfun))
+	update_ssa (TODO_update_ssa);
       pop_cfun ();
     }
 
@@ -9261,6 +9323,8 @@ expand_omp_target (struct omp_region *region)
       gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
       gsi_remove (&gsi, true);
     }
+  if (gimple_in_ssa_p (cfun))
+    update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 
@@ -9331,6 +9395,15 @@ expand_omp (struct omp_region *region)
 	  break;
 
 	case GIMPLE_OACC_KERNELS:
+	  if (!gimple_in_ssa_p (cfun))
+	    /* We're in pass_expand_omp.  Postpone expanding till
+	       pass_expand_omp_ssa.  */
+	    break;
+
+	  /* We're in pass_expand_omp_ssa.  Expand now.  */
+
+	  /* FALLTHRU.  */
+
 	case GIMPLE_OACC_PARALLEL:
 	case GIMPLE_OMP_TARGET:
 	  expand_omp_target (region);
@@ -9503,7 +9576,7 @@ const pass_data pass_data_expand_omp =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
-  PROP_gimple_eomp, /* properties_provided */
+  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
   0, /* todo_flags_finish */
@@ -9517,7 +9590,7 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual unsigned int execute (function *)
+  virtual unsigned int execute (function *fun)
     {
       bool gate = ((flag_openacc != 0 || flag_openmp != 0
 		    || flag_openmp_simd != 0 || flag_cilkplus != 0)
@@ -9528,7 +9601,12 @@ public:
       if (!gate)
 	return 0;
 
-      return execute_expand_omp ();
+      unsigned int res = execute_expand_omp ();
+
+      if (!cfun->contains_oacc_kernels)
+	fun->curr_properties |= PROP_gimple_eomp;
+
+      return res;
     }
 
 }; // class pass_expand_omp
@@ -9553,7 +9631,8 @@ const pass_data pass_data_expand_omp_ssa =
   PROP_gimple_eomp, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_rebuild_alias
+  | TODO_remove_unused_locals, /* todo_flags_finish */
 };
 
 class pass_expand_omp_ssa : public gimple_opt_pass
@@ -9568,7 +9647,47 @@ public:
     {
       return !(fun->curr_properties & PROP_gimple_eomp);
     }
-  virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  virtual unsigned int execute (function *)
+    {
+      unsigned res = execute_expand_omp ();
+
+      /* After running pass_expand_omp_ssa to expand the oacc kernels
+	 directive, we are left in the original function with anonymous
+	 SSA_NAMEs, with a defining statement that has been deleted.  This
+	 pass finds those SSA_NAMEs and releases them.  */
+      unsigned int i;
+      for (i = 1; i < num_ssa_names; ++i)
+	{
+	  tree name = ssa_name (i);
+	  if (name == NULL_TREE)
+	    continue;
+
+	  gimple stmt = SSA_NAME_DEF_STMT (name);
+	  bool found = false;
+
+	  ssa_op_iter op_iter;
+	  def_operand_p def_p;
+	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	    {
+	      tree def = DEF_FROM_PTR (def_p);
+	      if (def == name)
+		{
+		  found = true;
+		  break;
+		}
+	    }
+
+	  if (!found)
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Released dangling ssa name %u\n", i);
+	      release_ssa_name (name);
+	    }
+	}
+
+      return res;
+    }
+  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
@@ -11194,6 +11313,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   unsigned int map_cnt = 0;
   tree (*gimple_omp_clauses) (const_gimple);
   void (*gimple_omp_set_data_arg) (gimple, tree);
+  unsigned int static_arrays = 1;
 
   offloaded = is_gimple_omp_offloaded (stmt);
   data_region = false;
@@ -11202,6 +11322,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GIMPLE_OACC_KERNELS:
       gimple_omp_clauses = gimple_oacc_kernels_clauses;
       gimple_omp_set_data_arg = gimple_oacc_kernels_set_data_arg;
+      static_arrays = 0;
       break;
     case GIMPLE_OACC_PARALLEL:
       gimple_omp_clauses = gimple_oacc_parallel_clauses;
@@ -11368,7 +11489,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			  ".omp_data_sizes");
       DECL_NAMELESS (TREE_VEC_ELT (t, 1)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 1)) = 1;
-      TREE_STATIC (TREE_VEC_ELT (t, 1)) = 1;
+      TREE_STATIC (TREE_VEC_ELT (t, 1)) = static_arrays;
       tree tkind_type;
       int talign_shift;
       if (is_gimple_omp_oacc_specifically (stmt))
@@ -11386,7 +11507,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			  ".omp_data_kinds");
       DECL_NAMELESS (TREE_VEC_ELT (t, 2)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 2)) = 1;
-      TREE_STATIC (TREE_VEC_ELT (t, 2)) = 1;
+      TREE_STATIC (TREE_VEC_ELT (t, 2)) = static_arrays;
       gimple_omp_set_data_arg (stmt, t);
 
       vec<constructor_elt, va_gc> *vsize;
@@ -11559,6 +11680,22 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						    clobber));
 	}
 
+      if (!TREE_STATIC (TREE_VEC_ELT (t, 2)))
+	{
+	  gimple_seq initlist = NULL;
+	  force_gimple_operand (build1 (DECL_EXPR, void_type_node,
+					TREE_VEC_ELT (t, 2)),
+				&initlist, true, NULL_TREE);
+	  gimple_seq_add_seq (&ilist, initlist);
+
+	  tree clobber = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 2)),
+					    NULL);
+	  TREE_THIS_VOLATILE (clobber) = 1;
+	  gimple_seq_add_stmt (&olist,
+			       gimple_build_assign (TREE_VEC_ELT (t, 2),
+						    clobber));
+	}
+
       tree clobber = build_constructor (ctx->record_type, NULL);
       TREE_THIS_VOLATILE (clobber) = 1;
       gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl,
@@ -13739,4 +13876,37 @@ omp_finish_file (void)
     }
 }
 
+/* Return true if STMT is omp-lowered code.  */
+
+bool
+gimple_stmt_omp_lowering_p (gimple stmt)
+{
+  tree use;
+  ssa_op_iter iter;
+  const char *s;
+
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE|SSA_OP_DEF)
+    {
+      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
+	continue;
+      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
+
+      if (!(strcmp (".omp_data_i", s) == 0
+	    || strcmp (".omp_data_arr", s) == 0
+	    || strcmp (".omp_data_sizes", s) == 0
+	    || strcmp (".omp_data_kinds", s) == 0))
+	continue;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Detected omp lowering code\n");
+	  print_gimple_stmt (dump_file, stmt, 0, dump_flags);
+	}
+
+      return true;
+    }
+
+  return false;
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index ac587d0..ff8a956 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern bool gimple_stmt_omp_lowering_p (gimple);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git a/gcc/passes.def b/gcc/passes.def
index cfca4f1..bce8591 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -72,6 +72,7 @@ along with GCC; see the file COPYING3.  If not see
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
+	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
@@ -86,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 	     late.  */
           NEXT_PASS (pass_split_functions);
       POP_INSERT_PASSES ()
+      NEXT_PASS (pass_expand_omp_ssa);
       NEXT_PASS (pass_release_ssa_names);
       NEXT_PASS (pass_rebuild_cgraph_edges);
       NEXT_PASS (pass_inline_parameters);
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 7fc5220..8d0d1b8 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -164,6 +164,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "wide-int-print.h"
 #include "builtins.h"
+#include "omp-low.h"
 
 
 /* Possible lattice values.  */
@@ -788,6 +789,9 @@ surely_varying_stmt_p (gimple stmt)
       && gimple_code (stmt) != GIMPLE_CALL)
     return true;
 
+  if (gimple_stmt_omp_lowering_p (stmt))
+    return true;
+
   return false;
 }
 
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index a5283a2..a8f0701 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-into-ssa.h"
 #include "cfganal.h"
+#include "omp-low.h"
 
 /* This pass propagates the RHS of assignment statements into use
    sites of the LHS of the assignment.  It's basically a specialized
@@ -3675,7 +3676,8 @@ pass_forwprop::execute (function *fun)
 	  tree lhs, rhs;
 	  enum tree_code code;
 
-	  if (!is_gimple_assign (stmt))
+	  if (!is_gimple_assign (stmt)
+	      || gimple_stmt_omp_lowering_p (stmt))
 	    {
 	      gsi_next (&gsi);
 	      continue;
-- 
1.9.1






^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH, 2/8] Add pass_oacc_kernels
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
  2014-11-15 17:21 ` [PATCH, 1/8] Expand oacc kernels after pass_build_ealias Tom de Vries
@ 2014-11-15 17:22 ` Tom de Vries
  2014-11-25 11:31   ` Tom de Vries
  2014-11-15 17:23 ` [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels Tom de Vries
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 17:22 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 913 bytes --]

On 15-11-14 13:14, Tom de Vries wrote:
> Hi,
>
> I'm submitting a patch series with initial support for the oacc kernels directive.
>
> The patch series uses pass_parallelize_loops to implement parallelization of
> loops in the oacc kernels region.
>
> The patch series consists of these 8 patches:
> ...
>      1  Expand oacc kernels after pass_build_ealias
>      2  Add pass_oacc_kernels
>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>      5  Add pass_loop_im to pass_oacc_kernels
>      6  Add pass_ccp to pass_oacc_kernels
>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>      8  Do simple omp lowering for no address taken var
> ...

This patch adds a pass group pass_oacc_kernels.

The rationale is that we want a pass group to run oacc kernels region related 
(optimization) passes in.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0002-Add-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 3020 bytes --]

2014-11-14  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass group pass_oacc_kernels.
	* tree-pass.h (make_pass_oacc_kernels): Declare.
	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
	(pass_data_oacc_kernels): New pass_data.
	(class pass_oacc_kernels): New pass.
	(make_pass_oacc_kernels): New function.
---
 gcc/passes.def      |  5 +++++
 gcc/tree-pass.h     |  1 +
 gcc/tree-ssa-loop.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/gcc/passes.def b/gcc/passes.def
index bce8591..1fdb70a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -72,6 +72,11 @@ along with GCC; see the file COPYING3.  If not see
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
+	  /* Pass group that runs when there are oacc kernels in the
+	     function.  */
+	  NEXT_PASS (pass_oacc_kernels);
+	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_merge_phi);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index eaa69b4..0bae847 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -445,6 +445,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
 
 /* IPA Passes */
 extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 758b5fc..c29aa22 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -157,6 +157,54 @@ make_pass_tree_loop (gcc::context *ctxt)
   return new pass_tree_loop (ctxt);
 }
 
+/* Gate for oacc kernels pass group.  */
+
+static bool
+gate_oacc_kernels (function *fn)
+{
+  if (!flag_openacc)
+    return false;
+
+  return fn->contains_oacc_kernels;
+}
+
+/* The oacc kernels superpass.  */
+
+namespace {
+
+const pass_data pass_data_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fn) { return gate_oacc_kernels (fn); }
+
+}; // class pass_oacc_kernels
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_oacc_kernels (ctxt);
+}
+
 /* The no-loop superpass.  */
 
 namespace {
-- 
1.9.1






^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (2 preceding siblings ...)
  2014-11-15 17:23 ` [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels Tom de Vries
@ 2014-11-15 17:23 ` Tom de Vries
  2014-11-25 11:42   ` Tom de Vries
  2014-11-15 17:24 ` [PATCH, 5/8] Add pass_loop_im " Tom de Vries
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 17:23 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

On 15-11-14 13:14, Tom de Vries wrote:
> Hi,
>
> I'm submitting a patch series with initial support for the oacc kernels directive.
>
> The patch series uses pass_parallelize_loops to implement parallelization of
> loops in the oacc kernels region.
>
> The patch series consists of these 8 patches:
> ...
>      1  Expand oacc kernels after pass_build_ealias
>      2  Add pass_oacc_kernels
>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>      5  Add pass_loop_im to pass_oacc_kernels
>      6  Add pass_ccp to pass_oacc_kernels
>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>      8  Do simple omp lowering for no address taken var
> ...

This patch adds pass_tree_loop_init and pass_tree_loop_init_done to 
pass_oacc_kernels.

Pass_parallelize_loops is run between these passes in the pass group 
pass_tree_loop, since it requires loop information.  We do the same for 
pass_oacc_kernels.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0004-Add-pass_tree_loop_-init-done-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 1419 bytes --]

2014-11-14  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
	group pass_oacc_kernels.
	* tree-ssa-loop.c (pass_tree_loop_init::clone)
	(pass_tree_loop_done::clone): New function.
---
 gcc/passes.def      | 2 ++
 gcc/tree-ssa-loop.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/passes.def b/gcc/passes.def
index 5eefe73..83f437b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -77,6 +77,8 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
+	      NEXT_PASS (pass_tree_loop_init);
+	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index c29aa22..c78b013 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -269,6 +269,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
 
 }; // class pass_tree_loop_init
 
@@ -563,6 +564,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }
+  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
 
 }; // class pass_tree_loop_done
 
-- 
1.9.1






^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
  2014-11-15 17:21 ` [PATCH, 1/8] Expand oacc kernels after pass_build_ealias Tom de Vries
  2014-11-15 17:22 ` [PATCH, 2/8] Add pass_oacc_kernels Tom de Vries
@ 2014-11-15 17:23 ` Tom de Vries
  2014-11-25 11:39   ` Tom de Vries
  2014-11-15 17:23 ` [PATCH, 4/8] Add pass_tree_loop_{init,done} " Tom de Vries
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 17:23 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On 15-11-14 13:14, Tom de Vries wrote:
> Hi,
>
> I'm submitting a patch series with initial support for the oacc kernels directive.
>
> The patch series uses pass_parallelize_loops to implement parallelization of
> loops in the oacc kernels region.
>
> The patch series consists of these 8 patches:
> ...
>      1  Expand oacc kernels after pass_build_ealias
>      2  Add pass_oacc_kernels
>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>      5  Add pass_loop_im to pass_oacc_kernels
>      6  Add pass_ccp to pass_oacc_kernels
>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>      8  Do simple omp lowering for no address taken var
> ...

This patch adds a pass_ch_oacc_kernels to the pass group pass_oacc_kernels.

The idea is that pass_parallelize_loops only deals with loops for which the 
header has been copied, so the easiest way to meet that requirement when running 
pass_parallelize_loops in group pass_oacc_kernels, is to run pass_ch as a part 
of pass_oacc_kernels.

We define a seperate pass pass_ch_oacc_kernels, to leave all loops that aren't 
part of a kernels region alone.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0003-Add-pass_ch_oacc_kernels-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 7535 bytes --]

2014-11-14  Tom de Vries  <tom@codesourcery.com>

	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
	* tree-ssa-loop-ch.c: Include omp-low.h.
	(pass_ch_execute): Declare.
	(pass_ch::execute): Factor out ...
	(pass_ch_execute): ... this new function.  If handling oacc kernels,
	skip loops that are not in oacc kernels region.
	(pass_ch_oacc_kernels::execute):
	(pass_data_ch_oacc_kernels): New pass_data.
	(class pass_ch_oacc_kernels): New pass.
	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
	function.
---
 gcc/omp-low.c          | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/omp-low.h          |  2 ++
 gcc/passes.def         |  1 +
 gcc/tree-pass.h        |  1 +
 gcc/tree-ssa-loop-ch.c | 59 +++++++++++++++++++++++++++++++++--
 5 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6caeae9..e35fa8b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13909,4 +13909,87 @@ gimple_stmt_omp_lowering_p (gimple stmt)
   return false;
 }
 
+/* Return true if LOOP is inside a kernels region.  */
+
+bool
+loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry,
+			       basic_block *region_exit)
+{
+  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap region_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap_clear (region_bitmap);
+
+  if (region_entry != NULL)
+    *region_entry = NULL;
+  if (region_exit != NULL)
+    *region_exit = NULL;
+
+  basic_block bb;
+  gimple last;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      if (bitmap_bit_p (region_bitmap, bb->index))
+	continue;
+
+      last = last_stmt (bb);
+      if (!last)
+	continue;
+
+      if (gimple_code (last) != GIMPLE_OACC_KERNELS)
+	continue;
+
+      bitmap_clear (excludes_bitmap);
+      bitmap_set_bit (excludes_bitmap, bb->index);
+
+      vec<basic_block> dominated
+	= get_all_dominated_blocks (CDI_DOMINATORS, bb);
+
+      unsigned di;
+      basic_block dom;
+
+      basic_block end_region = NULL;
+      FOR_EACH_VEC_ELT (dominated, di, dom)
+	{
+	  if (dom == bb)
+	    continue;
+
+	  last = last_stmt (dom);
+	  if (!last)
+	    continue;
+
+	  if (gimple_code (last) != GIMPLE_OMP_RETURN)
+	    continue;
+
+	  if (end_region == NULL
+	      || dominated_by_p (CDI_DOMINATORS, end_region, dom))
+	    end_region = dom;
+	}
+
+      vec<basic_block> excludes
+	= get_all_dominated_blocks (CDI_DOMINATORS, end_region);
+
+      unsigned di2;
+      basic_block exclude;
+
+      FOR_EACH_VEC_ELT (excludes, di2, exclude)
+	if (exclude != end_region)
+	  bitmap_set_bit (excludes_bitmap, exclude->index);
+
+      FOR_EACH_VEC_ELT (dominated, di, dom)
+	if (!bitmap_bit_p (excludes_bitmap, dom->index))
+	  bitmap_set_bit (region_bitmap, dom->index);
+
+      if (bitmap_bit_p (region_bitmap, loop->header->index))
+	{
+	  if (region_entry != NULL)
+	    *region_entry = bb;
+	  if (region_exit != NULL)
+	    *region_exit = end_region;
+	  return true;
+	}
+    }
+
+  return false;
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index ff8a956..f1b9d77 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -29,6 +29,8 @@ extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
 extern bool gimple_stmt_omp_lowering_p (gimple);
+extern bool loop_in_oacc_kernels_region_p (struct loop *, basic_block *,
+					   basic_block *);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git a/gcc/passes.def b/gcc/passes.def
index 1fdb70a..5eefe73 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -76,6 +76,7 @@ along with GCC; see the file COPYING3.  If not see
 	     function.  */
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+	      NEXT_PASS (pass_ch_oacc_kernels);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 0bae847..1f599fa 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -374,6 +374,7 @@ extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_ch_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 300b2fa..8f91552 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -48,12 +48,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "flags.h"
 #include "tree-ssa-threadedge.h"
+#include "omp-low.h"
 
 /* Duplicates headers of loops if they are small enough, so that the statements
    in the loop body are always executed when the loop is entered.  This
    increases effectiveness of code motion optimizations, and reduces the need
    for loop preconditioning.  */
 
+static unsigned int pass_ch_execute (function *, bool);
+
 /* Check whether we should duplicate HEADER of LOOP.  At most *LIMIT
    instructions should be duplicated, limit is decreased by the actual
    amount.  */
@@ -172,6 +175,14 @@ public:
 unsigned int
 pass_ch::execute (function *fun)
 {
+  return pass_ch_execute (fun, false);
+}
+
+} // anon namespace
+
+static unsigned int
+pass_ch_execute (function *fun, bool oacc_kernels_p)
+{
   struct loop *loop;
   basic_block header;
   edge exit, entry;
@@ -205,6 +216,10 @@ pass_ch::execute (function *fun)
       if (do_while_loop_p (loop))
 	continue;
 
+      if (oacc_kernels_p
+	  && !loop_in_oacc_kernels_region_p (loop, NULL, NULL))
+	continue;
+
       /* Iterate the header copying up to limit; this takes care of the cases
 	 like while (a && b) {...}, where we want to have both of the conditions
 	 copied.  TODO -- handle while (a || b) - like cases, by not requiring
@@ -295,10 +310,50 @@ pass_ch::execute (function *fun)
   return 0;
 }
 
-} // anon namespace
-
 gimple_opt_pass *
 make_pass_ch (gcc::context *ctxt)
 {
   return new pass_ch (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_ch_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "ch_oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_CH, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+ class pass_ch_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_ch_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_ch_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_ch_oacc_kernels
+
+unsigned int
+pass_ch_oacc_kernels::execute (function *fun)
+{
+  return pass_ch_execute (fun, true);
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_ch_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_ch_oacc_kernels (ctxt);
+}
-- 
1.9.1






^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH, 5/8] Add pass_loop_im to pass_oacc_kernels
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (3 preceding siblings ...)
  2014-11-15 17:23 ` [PATCH, 4/8] Add pass_tree_loop_{init,done} " Tom de Vries
@ 2014-11-15 17:24 ` Tom de Vries
  2014-11-25 12:00   ` Tom de Vries
  2014-11-15 18:32 ` [PATCH, 6/8] Add pass_ccp " Tom de Vries
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 17:24 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 941 bytes --]

On 15-11-14 13:14, Tom de Vries wrote:
> Hi,
>
> I'm submitting a patch series with initial support for the oacc kernels directive.
>
> The patch series uses pass_parallelize_loops to implement parallelization of
> loops in the oacc kernels region.
>
> The patch series consists of these 8 patches:
> ...
>      1  Expand oacc kernels after pass_build_ealias
>      2  Add pass_oacc_kernels
>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>      5  Add pass_loop_im to pass_oacc_kernels
>      6  Add pass_ccp to pass_oacc_kernels
>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>      8  Do simple omp lowering for no address taken var
> ...

This patch adds pass_loop_im to pass group pass_oacc_kernels.

We need this pass to simplify the loop body, and allow pass_parloops to detect 
that loop iterations are independent.

OK for trunk?

Thanks,
- Tom



[-- Attachment #2: 0005-Add-pass_loop_im-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 23907 bytes --]

2014-11-14  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass_lim in pass group pass_ch_oacc_kernels.

	* c-c++-common/restrict-2.c: Update for new pass_lim.
	* c-c++-common/restrict-4.c: Same.
	* g++.dg/tree-ssa/pr33615.C:  Same.
	* g++.dg/tree-ssa/restrict1.C: Same.
	* gcc.dg/tm/pub-safety-1.c:  Same.
	* gcc.dg/tm/reg-promotion.c:  Same.
	* gcc.dg/tree-ssa/20050314-1.c:  Same.
	* gcc.dg/tree-ssa/loop-32.c: Same.
	* gcc.dg/tree-ssa/loop-33.c: Same.
	* gcc.dg/tree-ssa/loop-34.c: Same.
	* gcc.dg/tree-ssa/loop-35.c: Same.
	* gcc.dg/tree-ssa/loop-7.c: Same.
	* gcc.dg/tree-ssa/pr23109.c: Same.
	* gcc.dg/tree-ssa/restrict-3.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-1.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-10.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-11.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-12.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-2.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-3.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-6.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-7.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-8.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-9.c: Same.
	* gcc.dg/tree-ssa/structopt-1.c: Same.
	* gfortran.dg/pr32921.f: Same.
---
 gcc/passes.def                              | 1 +
 gcc/testsuite/c-c++-common/restrict-2.c     | 6 +++---
 gcc/testsuite/c-c++-common/restrict-4.c     | 6 +++---
 gcc/testsuite/g++.dg/tree-ssa/pr33615.C     | 6 +++---
 gcc/testsuite/g++.dg/tree-ssa/restrict1.C   | 6 +++---
 gcc/testsuite/gcc.dg/tm/pub-safety-1.c      | 6 +++---
 gcc/testsuite/gcc.dg/tm/reg-promotion.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-32.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-33.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-34.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-35.c     | 8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/loop-7.c      | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/pr23109.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c   | 8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c | 6 +++---
 gcc/testsuite/gfortran.dg/pr32921.f         | 6 +++---
 27 files changed, 81 insertions(+), 80 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 83f437b..f6c16b9 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
 	      NEXT_PASS (pass_tree_loop_init);
+	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_expand_omp_ssa);
diff --git a/gcc/testsuite/c-c++-common/restrict-2.c b/gcc/testsuite/c-c++-common/restrict-2.c
index 3f71b77..f0b0e15a 100644
--- a/gcc/testsuite/c-c++-common/restrict-2.c
+++ b/gcc/testsuite/c-c++-common/restrict-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim2-details" } */
 
 void foo (float * __restrict__ a, float * __restrict__ b, int n, int j)
 {
@@ -10,5 +10,5 @@ void foo (float * __restrict__ a, float * __restrict__ b, int n, int j)
 
 /* We should move the RHS of the store out of the loop.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 11 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 11 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/c-c++-common/restrict-4.c b/gcc/testsuite/c-c++-common/restrict-4.c
index 3a36def..f791533 100644
--- a/gcc/testsuite/c-c++-common/restrict-4.c
+++ b/gcc/testsuite/c-c++-common/restrict-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile }  */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 struct Foo
 {
@@ -15,5 +15,5 @@ void bar(struct Foo f, int * __restrict__ q)
     }
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr33615.C b/gcc/testsuite/g++.dg/tree-ssa/pr33615.C
index 801b334..2591e00 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr33615.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr33615.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fnon-call-exceptions -fdump-tree-lim1-details -w" } */
+/* { dg-options "-O -fnon-call-exceptions -fdump-tree-lim2-details -w" } */
 
 extern volatile int y;
 
@@ -16,5 +16,5 @@ foo (double a, int x)
 
 // The expression 1.0 / 0.0 should not be treated as a loop invariant
 // if it may throw an exception.
-// { dg-final { scan-tree-dump-times "invariant up to" 0 "lim1" } }
-// { dg-final { cleanup-tree-dump "lim1" } }
+// { dg-final { scan-tree-dump-times "invariant up to" 0 "lim2" } }
+// { dg-final { cleanup-tree-dump "lim2" } }
diff --git a/gcc/testsuite/g++.dg/tree-ssa/restrict1.C b/gcc/testsuite/g++.dg/tree-ssa/restrict1.C
index 682de7e..761e7e2 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/restrict1.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/restrict1.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 struct Foo
 {
@@ -16,5 +16,5 @@ void bar(Foo f, int * __restrict__ q)
     }
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tm/pub-safety-1.c b/gcc/testsuite/gcc.dg/tm/pub-safety-1.c
index 660e9a6..6d99410 100644
--- a/gcc/testsuite/gcc.dg/tm/pub-safety-1.c
+++ b/gcc/testsuite/gcc.dg/tm/pub-safety-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */
+/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim2" } */
 
 /* Test that thread visible loads do not get hoisted out of loops if
    the load would not have occurred on each path out of the loop.  */
@@ -20,5 +20,5 @@ void reader()
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tm/reg-promotion.c b/gcc/testsuite/gcc.dg/tm/reg-promotion.c
index e48bfb2..f1d2387 100644
--- a/gcc/testsuite/gcc.dg/tm/reg-promotion.c
+++ b/gcc/testsuite/gcc.dg/tm/reg-promotion.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim1" } */
+/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim2" } */
 
 /* Test that `count' is not written to unless p->data>0.  */
 
@@ -20,5 +20,5 @@ void func()
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Cannot hoist conditional load of count because it is in a transaction" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Cannot hoist conditional load of count because it is in a transaction" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c b/gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
index 8f07781..7f2e477 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-lim1-details --param allow-store-data-races=1" } */
+/* { dg-options "-O1 -fdump-tree-lim2-details --param allow-store-data-races=1" } */
 
 float a[100];
 
@@ -17,5 +17,5 @@ void xxx (void)
 /* Store motion may be applied to the assignment to a[k], since sinf
    cannot read nor write the memory.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-32.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
index f0c8d30..30b9d72 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -42,5 +42,5 @@ void test3(struct a *A)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-33.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
index bf16b13..281b336 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -36,5 +36,5 @@ void test5(struct a *A, unsigned b)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 4 "lim1" { xfail { lp64 || llp64 } } } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 4 "lim2" { xfail { lp64 || llp64 } } } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-34.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
index 125a220..e0ec9cf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int r[6];
 
@@ -17,5 +17,5 @@ void f (int n)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of r" 6 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of r" 6 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-35.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
index 2d2db70..5a1e875 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -67,6 +67,6 @@ void test4(struct a *A, unsigned LONG b)
     }
 }
 /* long index not hoisted for avr target PR 36561 */
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 8 "lim1" { xfail { "avr-*-*" } } } } */
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 6 "lim1" { target { "avr-*-*" } } } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 8 "lim2" { xfail { "avr-*-*" } } } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 6 "lim2" { target { "avr-*-*" } } } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-7.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
index 38e19e6..4e83170 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/19828 */
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-lim1-details" } */
+/* { dg-options "-O1 -fdump-tree-lim2-details" } */
 
 int cst_fun1 (int) __attribute__((__const__));
 int cst_fun2 (int) __attribute__((__const__));
@@ -31,5 +31,5 @@ int xxx (void)
    Calls to cst_fun2 and pure_fun2 should not be, since calling
    with k = 0 may be invalid.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr23109.c b/gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
index 73fd84d..0f92311 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -funsafe-math-optimizations -ftrapping-math -fdump-tree-recip -fdump-tree-lim1" } */
+/* { dg-options "-O2 -funsafe-math-optimizations -ftrapping-math -fdump-tree-recip -fdump-tree-lim2" } */
 /* { dg-warning "-fassociative-math disabled" "" { target *-*-* } 1 } */
 
 double F[2] = { 0., 0. }, e = 0.;
@@ -29,8 +29,8 @@ int main()
 /* LIM only performs the transformation in the no-trapping-math case.  In
    the future we will do it for trapping-math as well in recip, check that
    this is not wrongly optimized.  */
-/* { dg-final { scan-tree-dump-not "reciptmp" "lim1" } } */
+/* { dg-final { scan-tree-dump-not "reciptmp" "lim2" } } */
 /* { dg-final { scan-tree-dump-not "reciptmp" "recip" } } */
 /* { dg-final { cleanup-tree-dump "recip" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
index 95cc1a2..c3ca462 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim2-details" } */
 
 void f(int * __restrict__ r,
        int a[__restrict__ 16][16],
@@ -14,5 +14,5 @@ void f(int * __restrict__ r,
 
 /* We should apply store motion to the store to *r.  */
 
-/* { dg-final { scan-tree-dump "Executing store motion of \\\*r" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of \\\*r" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
index 3952a9a..0b22fc3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 /* This is a variant that does cause fold to place a cast to
    int before testing bit 1.  */
@@ -18,5 +18,5 @@ quantum_toffoli (int control1, int control2, int target,
     }
 }
 
-/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
index bc14926..4a218e0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int *l, *r;
 int test_func(void)
@@ -27,5 +27,5 @@ int test_func(void)
   return i;
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of pos" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of pos" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
index ea91a61..7315025 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fprofile-arcs -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fprofile-arcs -fdump-tree-lim2-details" } */
 
 struct thread_param
 {
@@ -21,5 +21,5 @@ void access_buf(struct thread_param* p)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of __gcov0.access_buf\\\[\[01\]\\\] from loop 1" 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of __gcov0.access_buf\\\[\[01\]\\\] from loop 1" 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
index e0d93a9..07855bb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 int a[1024];
 
@@ -23,5 +23,5 @@ void bar (int x, int z)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "!= 0 ? " 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "!= 0 ? " 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
index 2106b62..652d1ba 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 /* This is a variant that doesn't cause fold to place a cast to
    int before testing bit 1.  */
@@ -18,5 +18,5 @@ int size)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
index a81857c..29539fa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 struct { int x; int y; } global;
 void foo(int n)
@@ -9,6 +9,6 @@ void foo(int n)
     global.y += global.x*global.x;
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of global.y" "lim1" } } */
-/* { dg-final { scan-tree-dump "Moving statement.*global.x.*out of loop 1" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of global.y" "lim2" } } */
+/* { dg-final { scan-tree-dump "Moving statement.*global.x.*out of loop 1" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
index 100a230..a70bb2e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 double a[16][64], y[64], x[16];
 void foo(void)
@@ -10,5 +10,5 @@ void foo(void)
       y[j] = y[j] + a[i][j] * x[i];
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of y" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of y" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
index f8e15f3..6a67234 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 extern const int srcshift;
 
@@ -11,5 +11,5 @@ void foo (int *srcdata, int *dstdata)
     dstdata[i] = srcdata[i] << srcshift;
 }
 
-/* { dg-final { scan-tree-dump "Moving statement" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Moving statement" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
index 551b68f..c6f56ec 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 void bar (int);
 void foo (int n, int m)
@@ -16,5 +16,5 @@ void foo (int n, int m)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim1"  } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim2"  } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
index c5a6765..2233c90 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 void bar (int);
 void foo (int n, int m)
@@ -16,5 +16,5 @@ void foo (int n, int m)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim1"  } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim2"  } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c b/gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
index e5fe291..54cf44c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 int x; int y;
 struct { int x; int y; } global;
 int foo() {
@@ -10,6 +10,6 @@ int foo() {
 		global.y += global.x*global.x;
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of global.y" 1 "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of global.y" 1 "lim2" } } */
 /* XXX: We should also check for the load motion of global.x, but there is no easy way to do this.  */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gfortran.dg/pr32921.f b/gcc/testsuite/gfortran.dg/pr32921.f
index 45ea647..55b5604 100644
--- a/gcc/testsuite/gfortran.dg/pr32921.f
+++ b/gcc/testsuite/gfortran.dg/pr32921.f
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O2 -fdump-tree-lim1" }
+! { dg-options "-O2 -fdump-tree-lim2" }
 ! gfortran -c -m32 -O2 -S junk.f
 !
       MODULE LES3D_DATA
@@ -45,5 +45,5 @@
 
       RETURN
       END
-! { dg-final { scan-tree-dump-times "stride" 4 "lim1" } }
-! { dg-final { cleanup-tree-dump "lim1" } }
+! { dg-final { scan-tree-dump-times "stride" 4 "lim2" } }
+! { dg-final { cleanup-tree-dump "lim2" } }
-- 
1.9.1






^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH, 6/8] Add pass_ccp to pass_oacc_kernels
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (4 preceding siblings ...)
  2014-11-15 17:24 ` [PATCH, 5/8] Add pass_loop_im " Tom de Vries
@ 2014-11-15 18:32 ` Tom de Vries
  2014-11-25 12:03   ` Tom de Vries
  2014-11-15 18:52 ` [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels Tom de Vries
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 18:32 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 941 bytes --]

On 15-11-14 13:14, Tom de Vries wrote:
> Hi,
>
> I'm submitting a patch series with initial support for the oacc kernels directive.
>
> The patch series uses pass_parallelize_loops to implement parallelization of
> loops in the oacc kernels region.
>
> The patch series consists of these 8 patches:
> ...
>      1  Expand oacc kernels after pass_build_ealias
>      2  Add pass_oacc_kernels
>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>      5  Add pass_loop_im to pass_oacc_kernels
>      6  Add pass_ccp to pass_oacc_kernels
>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>      8  Do simple omp lowering for no address taken var
> ...

This patch adds pass_loop_ccp to pass group pass_oacc_kernels.

We need this pass to simplify the loop body, and allow pass_parloops to detect 
that loop iterations are independent.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0006-Add-pass_ccp-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 4796 bytes --]

2014-11-14  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass_ccp in pass group pass_oacc_kernels.

	* gcc.dg/pr43513.c: Update for new pass_ccp.
	* gcc.dg/tree-ssa/alias-17.c: Same.
	* gcc.dg/tree-ssa/foldconst-4.c: Same.
	* gcc.dg/tree-ssa/ssa-ccp-29.c: Same.
	* gcc.dg/tree-ssa/ssa-ccp-3.c: Same.
---
 gcc/passes.def                              | 1 +
 gcc/testsuite/gcc.dg/pr43513.c              | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/alias-17.c    | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c   | 6 +++---
 6 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index f6c16b9..cd9443c 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_ch_oacc_kernels);
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
+	      NEXT_PASS (pass_ccp);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_expand_omp_ssa);
diff --git a/gcc/testsuite/gcc.dg/pr43513.c b/gcc/testsuite/gcc.dg/pr43513.c
index 78a037b..3fb0890 100644
--- a/gcc/testsuite/gcc.dg/pr43513.c
+++ b/gcc/testsuite/gcc.dg/pr43513.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-ccp2" } */
+/* { dg-options "-O2 -fdump-tree-ccp3" } */
 
 void bar (int *);
 void foo (char *, int);
@@ -15,5 +15,5 @@ foo3 ()
     foo ("%d ", results[i]);
 }
 
-/* { dg-final { scan-tree-dump-times "alloca" 0 "ccp2"} } */
-/* { dg-final { cleanup-tree-dump "ccp2" } } */
+/* { dg-final { scan-tree-dump-times "alloca" 0 "ccp3"} } */
+/* { dg-final { cleanup-tree-dump "ccp3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-17.c b/gcc/testsuite/gcc.dg/tree-ssa/alias-17.c
index 48e72ff..59862f6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/alias-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fno-early-inlining -fdump-tree-ccp2" } */
+/* { dg-options "-O -fno-early-inlining -fdump-tree-ccp3" } */
 
 int *p;
 int inline bar(void) { return 0; }
@@ -14,5 +14,5 @@ int foo(int x)
   return *q + *p;
 }
 
-/* { dg-final { scan-tree-dump-not "NOTE: no flow-sensitive alias info for" "ccp2" } } */
-/* { dg-final { cleanup-tree-dump "ccp2" } } */
+/* { dg-final { scan-tree-dump-not "NOTE: no flow-sensitive alias info for" "ccp3" } } */
+/* { dg-final { cleanup-tree-dump "ccp3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c
index 445d415..916a857 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-ccp2" } */
+/* { dg-options "-O -fdump-tree-ccp3" } */
 
 struct a {int a,b;};
 const static struct a a;
@@ -10,5 +10,5 @@ test()
 {
   return a.a+b[c];
 }
-/* { dg-final { scan-tree-dump "return 0;" "ccp2" } } */
-/* { dg-final { cleanup-tree-dump "ccp2" } } */
+/* { dg-final { scan-tree-dump "return 0;" "ccp3" } } */
+/* { dg-final { cleanup-tree-dump "ccp3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c
index 44d2945..1e3f41b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-ccp2" } */
+/* { dg-options "-O -fdump-tree-ccp3" } */
 
 static double num;
 int foo (void)
@@ -7,5 +7,5 @@ int foo (void)
   return *(unsigned *)&num;
 }
 
-/* { dg-final { scan-tree-dump "return 0;" "ccp2" } } */
-/* { dg-final { cleanup-tree-dump "ccp2" } } */
+/* { dg-final { scan-tree-dump "return 0;" "ccp3" } } */
+/* { dg-final { cleanup-tree-dump "ccp3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c
index 86a706b..03717e1 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-ccp2" } */
+/* { dg-options "-O -fdump-tree-ccp3" } */
 
 extern void link_error (void);
 
@@ -133,5 +133,5 @@ int* test666 (int * __restrict__ rp1, int * __restrict__ rp2, int *p1)
    optimization has failed */
 /* ??? While we indeed don't handle some of these, a couple of the
    restrict tests are incorrect.  */
-/* { dg-final { scan-tree-dump-times "link_error" 0 "ccp2" { xfail *-*-* } } } */
-/* { dg-final { cleanup-tree-dump "ccp2" } } */
+/* { dg-final { scan-tree-dump-times "link_error" 0 "ccp3" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "ccp3" } } */
-- 
1.9.1






^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (5 preceding siblings ...)
  2014-11-15 18:32 ` [PATCH, 6/8] Add pass_ccp " Tom de Vries
@ 2014-11-15 18:52 ` Tom de Vries
  2014-11-25 12:15   ` Tom de Vries
  2014-11-15 19:04 ` [PATCH, 8/8] Do simple omp lowering for no address taken var Tom de Vries
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 18:52 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]

On 15-11-14 13:14, Tom de Vries wrote:
> Hi,
>
> I'm submitting a patch series with initial support for the oacc kernels directive.
>
> The patch series uses pass_parallelize_loops to implement parallelization of
> loops in the oacc kernels region.
>
> The patch series consists of these 8 patches:
> ...
>      1  Expand oacc kernels after pass_build_ealias
>      2  Add pass_oacc_kernels
>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>      5  Add pass_loop_im to pass_oacc_kernels
>      6  Add pass_ccp to pass_oacc_kernels
>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>      8  Do simple omp lowering for no address taken var
> ...

This patch adds:
- a specialized version of pass_parallelize_loops called
     pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and
- relevant test-cases.

The pass only handles loops that are in a kernels region, and skips over bits of 
pass_parallelize_loops that are already done for oacc kernels.

The pass reintroduces the use of omp_expand_local, I haven't managed to make it 
work yet using the external pass pass_expand_omp_ssa.

An obvious limitation of the patch is the fact that we copy over the clauses 
from the kernels directive to the generated parallel directive. We'll need to do 
something more intelligent here, f.i. setting vector_length based on the 
parallelization factor.

Another limitation is that the pass still needs -ftree-parallelize-loops to trigger.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0007-Add-pass_parloops_oacc_kernels-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 23063 bytes --]

2014-11-14  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
	pass_oacc_kernels.  Move pass_expand_omp_ssa into pass group
	pass_oacc_kernels.
	* tree-parloops.c (create_parallel_loop): Add function parameters
	region_entry and bool oacc_kernels_p.  Handle oacc_kernels_p.
	(gen_parallel_loop): Same.  Use omp_expand_local if oacc_kernels_p.
	Call create_parallel_loop with additional args.
	(parallelize_loops): Add function parameter oacc_kernels_p.  Calculate
	dominance info.  Skip loops that are not in a kernels region. Call
	gen_parallel_loop with additional args.
	(pass_parallelize_loops::execute): Call parallelize_loops with false
	argument.
	(pass_data_parallelize_loops_oacc_kernels): New pass_data.
	(class pass_parallelize_loops_oacc_kernels): New pass.
	(pass_parallelize_loops_oacc_kernels::execute)
	(make_pass_parallelize_loops_oacc_kernels): New function.
	* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.

	* testsuite/libgomp.oacc-c/oacc-kernels-2-run.c: New test.
	* testsuite/libgomp.oacc-c/oacc-kernels-run.c: New test.

	* gcc.dg/oacc-kernels-2.c: New test.
	* gcc.dg/oacc-kernels.c: New test.
---
 gcc/passes.def                                     |   3 +-
 gcc/testsuite/gcc.dg/oacc-kernels-2.c              |  79 +++++++
 gcc/testsuite/gcc.dg/oacc-kernels.c                |  71 ++++++
 gcc/tree-parloops.c                                | 242 ++++++++++++++++-----
 gcc/tree-pass.h                                    |   2 +
 .../testsuite/libgomp.oacc-c/oacc-kernels-2-run.c  |  65 ++++++
 .../testsuite/libgomp.oacc-c/oacc-kernels-run.c    |  59 +++++
 7 files changed, 465 insertions(+), 56 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels-2.c
 create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c

diff --git a/gcc/passes.def b/gcc/passes.def
index cd9443c..cc09ba9 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -80,9 +80,10 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_ccp);
+      	      NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+	      NEXT_PASS (pass_expand_omp_ssa);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
-	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
diff --git a/gcc/testsuite/gcc.dg/oacc-kernels-2.c b/gcc/testsuite/gcc.dg/oacc-kernels-2.c
new file mode 100644
index 0000000..1ff4bad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/oacc-kernels-2.c
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenacc } */
+/* { dg-options "-fopenacc -ftree-parallelize-loops=32 -O2 -std=c99 -fdump-tree-parloops_oacc_kernels-all -fdump-tree-copyrename" } */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+#pragma acc kernels copyout (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc kernels copyout (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  {
+    unsigned int sum = 0;
+
+    for (COUNTERTYPE i = 0; i < N; i++)
+      sum += c[i];
+
+    printf ("sum: %u\n", sum);
+
+    if (sum != N_REF)
+      abort ();
+  }
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only three loops are analyzed, and that all can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  It pops up first in
+   all_passes/pass_all_optimizations/pass_rename_ssa_copies.  */
+/* { dg-final { scan-tree-dump-times "Function main._omp_fn.0 " 1 "copyrename2" } } */
+/* { dg-final { scan-tree-dump-times "Function main._omp_fn.1 " 1 "copyrename2" } } */
+/* { dg-final { scan-tree-dump-times "Function main._omp_fn.2 " 1 "copyrename2" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "copyrename*" } } */
diff --git a/gcc/testsuite/gcc.dg/oacc-kernels.c b/gcc/testsuite/gcc.dg/oacc-kernels.c
new file mode 100644
index 0000000..de94aa9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/oacc-kernels.c
@@ -0,0 +1,71 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenacc } */
+/* { dg-options "-fopenacc -ftree-parallelize-loops=32 -O2 -std=c99 -fdump-tree-parloops_oacc_kernels-all -fdump-tree-copyrename" } */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  {
+    unsigned int sum = 0;
+
+    for (COUNTERTYPE i = 0; i < N; i++)
+      sum += c[i];
+
+    printf ("sum: %u\n", sum);
+
+    if (sum != N_REF)
+      abort ();
+  }
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  It pops up first in
+   all_passes/pass_all_optimizations/pass_rename_ssa_copies.  */
+/* { dg-final { scan-tree-dump-times "Function main._omp_fn.0 " 1 "copyrename2" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "copyrename*" } } */
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e5dca78..7bc945b 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1611,7 +1611,8 @@ transform_to_exit_first_loop (struct loop *loop,
 
 static basic_block
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
-		      tree new_data, unsigned n_threads, location_t loc)
+		      tree new_data, unsigned n_threads, location_t loc,
+		      basic_block region_entry, bool oacc_kernels_p)
 {
   gimple_stmt_iterator gsi;
   basic_block bb, paral_bb, for_bb, ex_bb;
@@ -1623,15 +1624,44 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   /* Prepare the GIMPLE_OMP_PARALLEL statement.  */
   bb = loop_preheader_edge (loop)->src;
   paral_bb = single_pred (bb);
-  gsi = gsi_last_bb (paral_bb);
+  if (!oacc_kernels_p)
+    gsi = gsi_last_bb (paral_bb);
+  else
+    /* Make sure the oacc parallel is inserted on top of the oacc kernels
+       region.  */
+    gsi = gsi_last_bb (region_entry);
 
-  t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
-  OMP_CLAUSE_NUM_THREADS_EXPR (t)
-    = build_int_cst (integer_type_node, n_threads);
-  stmt = gimple_build_omp_parallel (NULL, t, loop_fn, data);
-  gimple_set_location (stmt, loc);
+  if (!oacc_kernels_p)
+    {
+      t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
+      OMP_CLAUSE_NUM_THREADS_EXPR (t)
+	= build_int_cst (integer_type_node, n_threads);
+      stmt = gimple_build_omp_parallel (NULL, t, loop_fn, data);
+      gimple_set_location (stmt, loc);
 
-  gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+    }
+  else
+    {
+      /* Create oacc parallel pragma based on oacc kernels pragma.  */
+      gimple kernels = last_stmt (region_entry);
+      stmt = gimple_build_oacc_parallel (NULL,
+					 gimple_oacc_kernels_clauses (kernels));
+      tree child_fn = gimple_oacc_kernels_child_fn (kernels);
+      gimple_oacc_parallel_set_child_fn (stmt, child_fn);
+      tree data_arg = gimple_oacc_kernels_data_arg (kernels);
+      gimple_oacc_parallel_set_data_arg (stmt, data_arg);
+
+      gimple_set_location (stmt, loc);
+
+      /* Insert oacc parallel pragma after the oacc kernels pragma.  */
+      {
+	gimple_stmt_iterator gsi2;
+	gsi2 = gsi;
+	gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	gsi_remove (&gsi2, true);
+      }
+    }
 
   /* Initialize NEW_DATA.  */
   if (data)
@@ -1647,12 +1677,18 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
       gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
     }
 
-  /* Emit GIMPLE_OMP_RETURN for GIMPLE_OMP_PARALLEL.  */
-  bb = split_loop_exit_edge (single_dom_exit (loop));
-  gsi = gsi_last_bb (bb);
-  stmt = gimple_build_omp_return (false);
-  gimple_set_location (stmt, loc);
-  gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+  /* Skip insertion of OMP_RETURN for oacc_kernels_p.  We've already generated
+     one when lowering the oacc kernels directive in
+     pass_lower_omp/lower_omp (). */
+  if (!oacc_kernels_p)
+    {
+      /* Emit GIMPLE_OMP_RETURN for GIMPLE_OMP_PARALLEL.  */
+      bb = split_loop_exit_edge (single_dom_exit (loop));
+      gsi = gsi_last_bb (bb);
+      stmt = gimple_build_omp_return (false);
+      gimple_set_location (stmt, loc);
+      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+    }
 
   /* Extract data for GIMPLE_OMP_FOR.  */
   gcc_assert (loop->header == single_dom_exit (loop)->src);
@@ -1705,7 +1741,11 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   t = build_omp_clause (loc, OMP_CLAUSE_SCHEDULE);
   OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
 
-  for_stmt = gimple_build_omp_for (NULL, GF_OMP_FOR_KIND_FOR, t, 1, NULL);
+  for_stmt = gimple_build_omp_for (NULL,
+				   (oacc_kernels_p
+				    ? GF_OMP_FOR_KIND_OACC_LOOP
+				    : GF_OMP_FOR_KIND_FOR),
+				   NULL_TREE, 1, NULL);
   gimple_set_location (for_stmt, loc);
   gimple_omp_for_set_index (for_stmt, 0, initvar);
   gimple_omp_for_set_initial (for_stmt, 0, cvar_init);
@@ -1736,7 +1776,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   free_dominance_info (CDI_DOMINATORS);
   calculate_dominance_info (CDI_DOMINATORS);
 
-  return paral_bb;
+  return oacc_kernels_p ? region_entry : paral_bb;
 }
 
 /* Generates code to execute the iterations of LOOP in N_THREADS
@@ -1748,11 +1788,13 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 static void
 gen_parallel_loop (struct loop *loop,
 		   reduction_info_table_type *reduction_list,
-		   unsigned n_threads, struct tree_niter_desc *niter)
+		   unsigned n_threads, struct tree_niter_desc *niter,
+		   basic_block region_entry, bool oacc_kernels_p)
 {
   tree many_iterations_cond, type, nit;
   tree arg_struct, new_arg_struct;
   gimple_seq stmts;
+  basic_block parallel_head;
   edge entry, exit;
   struct clsn_data clsn_data;
   unsigned prob;
@@ -1829,40 +1871,43 @@ gen_parallel_loop (struct loop *loop,
   if (stmts)
     gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
 
-  if (loop->inner)
-    m_p_thread=2;
-  else
-    m_p_thread=MIN_PER_THREAD;
-
-   many_iterations_cond =
-     fold_build2 (GE_EXPR, boolean_type_node,
-                nit, build_int_cst (type, m_p_thread * n_threads));
-
-  many_iterations_cond
-    = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-		   invert_truthvalue (unshare_expr (niter->may_be_zero)),
-		   many_iterations_cond);
-  many_iterations_cond
-    = force_gimple_operand (many_iterations_cond, &stmts, false, NULL_TREE);
-  if (stmts)
-    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
-  if (!is_gimple_condexpr (many_iterations_cond))
+  if (!oacc_kernels_p)
     {
+      if (loop->inner)
+	m_p_thread=2;
+      else
+	m_p_thread=MIN_PER_THREAD;
+
+      many_iterations_cond =
+	fold_build2 (GE_EXPR, boolean_type_node,
+		     nit, build_int_cst (type, m_p_thread * n_threads));
+
+      many_iterations_cond
+	= fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+		       invert_truthvalue (unshare_expr (niter->may_be_zero)),
+		       many_iterations_cond);
       many_iterations_cond
-	= force_gimple_operand (many_iterations_cond, &stmts,
-				true, NULL_TREE);
+	= force_gimple_operand (many_iterations_cond, &stmts, false, NULL_TREE);
       if (stmts)
 	gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
-    }
+      if (!is_gimple_condexpr (many_iterations_cond))
+	{
+	  many_iterations_cond
+	    = force_gimple_operand (many_iterations_cond, &stmts,
+				    true, NULL_TREE);
+	  if (stmts)
+	    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
+	}
 
-  initialize_original_copy_tables ();
+      initialize_original_copy_tables ();
 
-  /* We assume that the loop usually iterates a lot.  */
-  prob = 4 * REG_BR_PROB_BASE / 5;
-  loop_version (loop, many_iterations_cond, NULL,
-		prob, prob, REG_BR_PROB_BASE - prob, true);
-  update_ssa (TODO_update_ssa);
-  free_original_copy_tables ();
+      /* We assume that the loop usually iterates a lot.  */
+      prob = 4 * REG_BR_PROB_BASE / 5;
+      loop_version (loop, many_iterations_cond, NULL,
+		    prob, prob, REG_BR_PROB_BASE - prob, true);
+      update_ssa (TODO_update_ssa);
+      free_original_copy_tables ();
+    }
 
   /* Base all the induction variables in LOOP on a single control one.  */
   canonicalize_loop_ivs (loop, &nit, true);
@@ -1879,19 +1924,31 @@ gen_parallel_loop (struct loop *loop,
   entry = loop_preheader_edge (loop);
   exit = single_dom_exit (loop);
 
-  eliminate_local_variables (entry, exit);
-  /* In the old loop, move all variables non-local to the loop to a structure
-     and back, and create separate decls for the variables used in loop.  */
-  separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
-			    &new_arg_struct, &clsn_data);
+  /* This rewrites the body in terms of new variables.  This has already
+     been done for oacc_kernels_p in pass_lower_omp/lower_omp ().  */
+  if (!oacc_kernels_p)
+    {
+      eliminate_local_variables (entry, exit);
+      /* In the old loop, move all variables non-local to the loop to a
+	 structure and back, and create separate decls for the variables used in
+	 loop.  */
+      separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
+				&new_arg_struct, &clsn_data);
+    }
+  else
+    {
+      arg_struct = NULL_TREE;
+      new_arg_struct = NULL_TREE;
+    }
 
   /* Create the parallel constructs.  */
   loc = UNKNOWN_LOCATION;
   cond_stmt = last_stmt (loop->header);
   if (cond_stmt)
     loc = gimple_location (cond_stmt);
-  create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
-			new_arg_struct, n_threads, loc);
+  parallel_head = create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
+					new_arg_struct, n_threads, loc,
+					region_entry, oacc_kernels_p);
   if (reduction_list->elements () > 0)
     create_call_for_reduction (loop, reduction_list, &clsn_data);
 
@@ -1905,6 +1962,16 @@ gen_parallel_loop (struct loop *loop,
      removed statements.  */
   FOR_EACH_LOOP (loop, 0)
     free_numbers_of_iterations_estimates_loop (loop);
+
+  if (oacc_kernels_p)
+    {
+      /* Expand the parallel constructs.  We do it directly here instead of
+	 running a separate expand_omp pass, since it is more efficient, and
+	 less likely to cause troubles with further analyses not being able to
+	 deal with the OMP trees.  */
+
+      omp_expand_local (parallel_head);
+    }
 }
 
 /* Returns true when LOOP contains vector phi nodes.  */
@@ -2131,7 +2198,7 @@ try_create_reduction_list (loop_p loop,
    otherwise.  */
 
 bool
-parallelize_loops (void)
+parallelize_loops (bool oacc_kernels_p)
 {
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
@@ -2140,6 +2207,7 @@ parallelize_loops (void)
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
   source_location loop_loc;
+  basic_block region_entry, region_exit;
 
   /* Do not parallelize loops in the functions created by parallelization.  */
   if (parallelized_function_p (cfun->decl))
@@ -2151,9 +2219,25 @@ parallelize_loops (void)
   reduction_info_table_type reduction_list (10);
   init_stmt_vec_info_vec ();
 
+  calculate_dominance_info (CDI_DOMINATORS);
+
   FOR_EACH_LOOP (loop, 0)
     {
       reduction_list.empty ();
+
+      if (oacc_kernels_p)
+	{
+	  if (!loop_in_oacc_kernels_region_p (loop, &region_entry, &region_exit))
+	    continue;
+	  else
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file,
+			 "Trying loop %d with header bb %d in oacc kernels region\n",
+			 loop->num, loop->header->index);
+	    }
+	}
+
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
         fprintf (dump_file, "Trying loop %d as candidate\n",loop->num);
@@ -2223,8 +2307,9 @@ parallelize_loops (void)
 	  fprintf (dump_file, "\nloop at %s:%d: ",
 		   LOCATION_FILE (loop_loc), LOCATION_LINE (loop_loc));
       }
+
       gen_parallel_loop (loop, &reduction_list,
-			 n_threads, &niter_desc);
+			 n_threads, &niter_desc, region_entry, oacc_kernels_p);
     }
 
   free_stmt_vec_info_vec ();
@@ -2275,7 +2360,7 @@ pass_parallelize_loops::execute (function *fun)
   if (number_of_loops (fun) <= 1)
     return 0;
 
-  if (parallelize_loops ())
+  if (parallelize_loops (false))
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
       return TODO_update_ssa;
@@ -2293,4 +2378,51 @@ make_pass_parallelize_loops (gcc::context *ctxt)
 }
 
 
+namespace {
+
+const pass_data pass_data_parallelize_loops_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "parloops_oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_PARALLELIZE_LOOPS, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_parallelize_loops_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_parallelize_loops_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_parallelize_loops_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_parallelize_loops_oacc_kernels
+
+unsigned
+pass_parallelize_loops_oacc_kernels::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  if (parallelize_loops (true))
+    return TODO_cleanup_cfg | TODO_rebuild_alias;
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_parallelize_loops_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_parallelize_loops_oacc_kernels (ctxt);
+}
+
 #include "gt-tree-parloops.h"
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 1f599fa..e769e4f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -370,6 +370,8 @@ extern gimple_opt_pass *make_pass_slp_vectorize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_complete_unroll (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_complete_unrolli (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_parallelize_loops (gcc::context *ctxt);
+extern gimple_opt_pass *
+  make_pass_parallelize_loops_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
diff --git a/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c b/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c
new file mode 100644
index 0000000..5cdae0b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2 -std=c99" } */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+#pragma acc kernels copyout (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc kernels copyout (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  {
+    unsigned int sum = 0;
+
+    for (COUNTERTYPE i = 0; i < N; i++)
+      sum += c[i];
+
+    printf ("sum: %u\n", sum);
+
+    if (sum != N_REF)
+      abort ();
+  }
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c b/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c
new file mode 100644
index 0000000..b9e62a0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c
@@ -0,0 +1,59 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2 -std=c99" } */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  {
+    unsigned int sum = 0;
+
+    for (COUNTERTYPE i = 0; i < N; i++)
+      sum += c[i];
+
+    printf ("sum: %u\n", sum);
+
+    if (sum != N_REF)
+      abort ();
+  }
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
-- 
1.9.1






^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (6 preceding siblings ...)
  2014-11-15 18:52 ` [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels Tom de Vries
@ 2014-11-15 19:04 ` Tom de Vries
  2014-11-17 10:29   ` Richard Biener
  2014-11-19 20:34 ` openacc kernels directive -- initial support Tom de Vries
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-15 19:04 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 3583 bytes --]

On 15-11-14 13:14, Tom de Vries wrote:
> Hi,
>
> I'm submitting a patch series with initial support for the oacc kernels directive.
>
> The patch series uses pass_parallelize_loops to implement parallelization of
> loops in the oacc kernels region.
>
> The patch series consists of these 8 patches:
> ...
>      1  Expand oacc kernels after pass_build_ealias
>      2  Add pass_oacc_kernels
>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>      5  Add pass_loop_im to pass_oacc_kernels
>      6  Add pass_ccp to pass_oacc_kernels
>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>      8  Do simple omp lowering for no address taken var
> ...

This patch lowers integer variables that do not have their address taken as 
local variable.  We use a copy at region entry and exit to copy the value in and 
out.

In the context of reduction handling in a kernels region, this allows the 
parloops reduction analysis to recognize the reduction, even after oacc lowering 
has been done in pass_lower_omp.

In more detail, without this patch, the omp_data_i load and stores are generated 
in place (in this case, in the loop):
...
                 {
                   .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
                   {
                     unsigned intD.9 iD.2146;

                     iD.2146 = 0;
                     goto <D.2207>;
                     <D.2208>:
                     D.2216 = .omp_data_iD.2201->cD.2203;
                     c.9D.2176 = *D.2216;
                     D.2177 = (long unsigned intD.10) iD.2146;
                     D.2178 = D.2177 * 4;
                     D.2179 = c.9D.2176 + D.2178;
                     D.2180 = *D.2179;
                     D.2217 = .omp_data_iD.2201->sumD.2205;
                     D.2218 = *D.2217;
                     D.2217 = .omp_data_iD.2201->sumD.2205;
                     D.2219 = D.2180 + D.2218;
                     *D.2217 = D.2219;
                     iD.2146 = iD.2146 + 1;
                     <D.2207>:
                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
                     <D.2209>:
                   }
...

With this patch, the omp_data_i load and stores for sum are generated at entry 
and exit:
...
                 {
                   .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
                   D.2216 = .omp_data_iD.2201->sumD.2205;
                   sumD.2206 = *D.2216;
                   {
                     unsigned intD.9 iD.2146;

                     iD.2146 = 0;
                     goto <D.2207>;
                     <D.2208>:
                     D.2217 = .omp_data_iD.2201->cD.2203;
                     c.9D.2176 = *D.2217;
                     D.2177 = (long unsigned intD.10) iD.2146;
                     D.2178 = D.2177 * 4;
                     D.2179 = c.9D.2176 + D.2178;
                     D.2180 = *D.2179;
                     sumD.2206 = D.2180 + sumD.2206;
                     iD.2146 = iD.2146 + 1;
                     <D.2207>:
                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
                     <D.2209>:
                   }
                   *D.2216 = sumD.2206;
                   #pragma omp return
                 }
...


So, without the patch the reduction operation looks like this:
...
     *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) + x
...

And with this patch the reduction operation is simply:
...
     sumD.2206 = sumD.2206 + x:
...

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0008-Do-simple-omp-lowering-for-no-address-taken-var.patch --]
[-- Type: text/x-patch, Size: 5116 bytes --]

2014-11-03  Tom de Vries  <tom@codesourcery.com>

	* gimple.c (gimple_seq_ior_addresses_taken_op)
	(gimple_seq_ior_addresses_taken): New function.
	* gimple.h (gimple_seq_ior_addresses_taken): Declare.
	* omp-low.c (addresses_taken): Declare local variable.
	(lower_oacc_offload): Lower variables that do not have their address
	taken as local variable.  Use a copy at region entry and exit to copy
	the value in and out.
	(execute_lower_omp): Calculate addresses_taken.
---
 gcc/gimple.c  | 35 +++++++++++++++++++++++++++++++++++
 gcc/gimple.h  |  1 +
 gcc/omp-low.c | 25 ++++++++++++++++++++++---
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple.c b/gcc/gimple.c
index a9174e6..107eb26 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2428,6 +2428,41 @@ gimple_ior_addresses_taken (bitmap addresses_taken, gimple stmt)
 					gimple_ior_addresses_taken_1);
 }
 
+/* Helper function for gimple_seq_ior_addresses_taken.  */
+
+static tree
+gimple_seq_ior_addresses_taken_op (tree *tp,
+				   int *walk_subtrees ATTRIBUTE_UNUSED,
+				   void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *)data;
+  bitmap addresses_taken = (bitmap)wi->info;
+
+  tree t = *tp;
+  if (TREE_CODE (t) != ADDR_EXPR)
+    return NULL_TREE;
+
+  tree var = TREE_OPERAND (t, 0);
+  if (!DECL_P (var))
+    return NULL_TREE;
+
+  bitmap_set_bit (addresses_taken, DECL_UID (var));
+
+  return NULL_TREE;
+}
+
+/* Find the decls in SEQ that have their address taken, and set the
+   corresponding decl_uid in ADDRESSES_TAKEN.  */
+
+void
+gimple_seq_ior_addresses_taken (gimple_seq seq, bitmap addresses_taken)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = addresses_taken;
+
+  walk_gimple_seq (seq, NULL, gimple_seq_ior_addresses_taken_op, &wi);
+}
 
 /* Return true if TYPE1 and TYPE2 are compatible enough for builtin
    processing.  */
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 4faeaaa..528a9df 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1316,6 +1316,7 @@ extern tree gimple_unsigned_type (tree);
 extern tree gimple_signed_type (tree);
 extern alias_set_type gimple_get_alias_set (tree);
 extern bool gimple_ior_addresses_taken (bitmap, gimple);
+extern void gimple_seq_ior_addresses_taken (gimple_seq, bitmap);
 extern bool gimple_builtin_call_types_compatible_p (const_gimple, tree);
 extern bool gimple_call_builtin_p (const_gimple);
 extern bool gimple_call_builtin_p (const_gimple, enum built_in_class);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e35fa8b..ff78b04 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -229,6 +229,7 @@ static int target_nesting_level;
 static struct omp_region *root_omp_region;
 static bitmap task_shared_vars;
 static vec<omp_context *> taskreg_contexts;
+static bitmap addresses_taken;
 
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
@@ -11307,7 +11308,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   tree child_fn, t, c;
   gimple stmt = gsi_stmt (*gsi_p);
   gimple tgt_bind, bind;
-  gimple_seq tgt_body, olist, ilist, orlist, irlist, new_body;
+  gimple_seq tgt_body, olist, ilist, orlist, irlist, olist2, ilist2, new_body;
   location_t loc = gimple_location (stmt);
   bool offloaded, data_region;
   unsigned int map_cnt = 0;
@@ -11368,6 +11369,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   irlist = NULL;
   orlist = NULL;
+  ilist2 = NULL;
+  olist2 = NULL;
   switch (gimple_code (stmt))
     {
     case GIMPLE_OACC_KERNELS:
@@ -11451,8 +11454,18 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		&& !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
 		&& TREE_CODE (TREE_TYPE (var)) == ARRAY_TYPE)
 	      x = build_simple_mem_ref (x);
-	    SET_DECL_VALUE_EXPR (new_var, x);
-	    DECL_HAS_VALUE_EXPR_P (new_var) = 1;
+	    if (gimple_code (stmt) == GIMPLE_OACC_KERNELS
+		&& !bitmap_bit_p (addresses_taken, DECL_UID (var))
+		&& INTEGRAL_TYPE_P (TREE_TYPE (var)))
+	      {
+		gimplify_assign (new_var, x, &ilist2);
+		gimplify_assign (unshare_expr (x), new_var, &olist2);
+	      }
+	    else
+	      {
+		SET_DECL_VALUE_EXPR (new_var, x);
+		DECL_HAS_VALUE_EXPR_P (new_var) = 1;
+	      }
 	  }
 	map_cnt++;
       }
@@ -11719,7 +11732,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      gimple_seq_add_seq (&new_body, ilist2);
       gimple_seq_add_seq (&new_body, tgt_body);
+      gimple_seq_add_seq (&new_body, olist2);
       new_body = maybe_catch_exception (new_body);
     }
   else if (data_region)
@@ -12054,6 +12069,9 @@ execute_lower_omp (void)
       && flag_cilkplus == 0)
     return 0;
 
+  addresses_taken = BITMAP_ALLOC (NULL);
+  gimple_seq_ior_addresses_taken (gimple_body (cfun->decl), addresses_taken);
+
   all_contexts = splay_tree_new (splay_tree_compare_pointers, 0,
 				 delete_omp_context);
 
@@ -12079,6 +12097,7 @@ execute_lower_omp (void)
       all_contexts = NULL;
     }
   BITMAP_FREE (task_shared_vars);
+  BITMAP_FREE (addresses_taken);
   return 0;
 }
 
-- 
1.9.1






^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-15 19:04 ` [PATCH, 8/8] Do simple omp lowering for no address taken var Tom de Vries
@ 2014-11-17 10:29   ` Richard Biener
  2014-11-18  9:13     ` Eric Botcazou
  2014-11-24 11:53     ` Tom de Vries
  0 siblings, 2 replies; 71+ messages in thread
From: Richard Biener @ 2014-11-17 10:29 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek, Thomas Schwinge, ebotcazou

On Sat, 15 Nov 2014, Tom de Vries wrote:

> On 15-11-14 13:14, Tom de Vries wrote:
> > Hi,
> > 
> > I'm submitting a patch series with initial support for the oacc kernels
> > directive.
> > 
> > The patch series uses pass_parallelize_loops to implement parallelization of
> > loops in the oacc kernels region.
> > 
> > The patch series consists of these 8 patches:
> > ...
> >      1  Expand oacc kernels after pass_build_ealias
> >      2  Add pass_oacc_kernels
> >      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >      5  Add pass_loop_im to pass_oacc_kernels
> >      6  Add pass_ccp to pass_oacc_kernels
> >      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >      8  Do simple omp lowering for no address taken var
> > ...
> 
> This patch lowers integer variables that do not have their address taken as
> local variable.  We use a copy at region entry and exit to copy the value in
> and out.
> 
> In the context of reduction handling in a kernels region, this allows the
> parloops reduction analysis to recognize the reduction, even after oacc
> lowering has been done in pass_lower_omp.
> 
> In more detail, without this patch, the omp_data_i load and stores are
> generated in place (in this case, in the loop):
> ...
>                 {
>                   .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
>                   {
>                     unsigned intD.9 iD.2146;
> 
>                     iD.2146 = 0;
>                     goto <D.2207>;
>                     <D.2208>:
>                     D.2216 = .omp_data_iD.2201->cD.2203;
>                     c.9D.2176 = *D.2216;
>                     D.2177 = (long unsigned intD.10) iD.2146;
>                     D.2178 = D.2177 * 4;
>                     D.2179 = c.9D.2176 + D.2178;
>                     D.2180 = *D.2179;
>                     D.2217 = .omp_data_iD.2201->sumD.2205;
>                     D.2218 = *D.2217;
>                     D.2217 = .omp_data_iD.2201->sumD.2205;
>                     D.2219 = D.2180 + D.2218;
>                     *D.2217 = D.2219;
>                     iD.2146 = iD.2146 + 1;
>                     <D.2207>:
>                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>                     <D.2209>:
>                   }
> ...
> 
> With this patch, the omp_data_i load and stores for sum are generated at entry
> and exit:
> ...
>                 {
>                   .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
>                   D.2216 = .omp_data_iD.2201->sumD.2205;
>                   sumD.2206 = *D.2216;
>                   {
>                     unsigned intD.9 iD.2146;
> 
>                     iD.2146 = 0;
>                     goto <D.2207>;
>                     <D.2208>:
>                     D.2217 = .omp_data_iD.2201->cD.2203;
>                     c.9D.2176 = *D.2217;
>                     D.2177 = (long unsigned intD.10) iD.2146;
>                     D.2178 = D.2177 * 4;
>                     D.2179 = c.9D.2176 + D.2178;
>                     D.2180 = *D.2179;
>                     sumD.2206 = D.2180 + sumD.2206;
>                     iD.2146 = iD.2146 + 1;
>                     <D.2207>:
>                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>                     <D.2209>:
>                   }
>                   *D.2216 = sumD.2206;
>                   #pragma omp return
>                 }
> ...
> 
> 
> So, without the patch the reduction operation looks like this:
> ...
>     *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) + x
> ...
> 
> And with this patch the reduction operation is simply:
> ...
>     sumD.2206 = sumD.2206 + x:
> ...
> 
> OK for trunk?

I presume the reason you are trying to do that here is that otherwise
it happens too late?  What you do is what loop store motion would
do.

Now - I can see how that is easily confused by the static chain
being address-taken.  But I also remember that Eric did some
preparatory work to fix that, for nested functions, that is,
possibly setting DECL_NONADDRESSABLE_P?  Don't remember exactly.

That said - the gimple_seq_ior_addresses_taken_op callback looks
completely broken.  Consider &a.x which you'd fail to mark as
address-taken.  It looks like the body is not yet in CFG form
when you apply all this?

That said - the functions do not belong to gimple.[ch] at least
as they are not going to work in general.  I also question
why they are necessary - you do

+           if (gimple_code (stmt) == GIMPLE_OACC_KERNELS
+               && !bitmap_bit_p (addresses_taken, DECL_UID (var))
+               && INTEGRAL_TYPE_P (TREE_TYPE (var)))

but why don't you simply check TREE_ADDRESSABLE (var)?  TREE_ADDRESSABLE
is conservative correct here.

And the above won't help for float reductions.  So if, then you
should probably test is_gimple_reg_type (TREE_TYPE (var)) instead
of INTEGRAL_TYPE_P and you definitely should limit the number of
vars treated this way.

Oh - and the optimization should be somewhere more general - after
all it applies to all nested functions (thus move it to tree-nested.c?)
and to autopar loops as well.  Not sure how much code the omp
lowering shares with unnesting - but hopefully enough.

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-17 10:29   ` Richard Biener
@ 2014-11-18  9:13     ` Eric Botcazou
  2014-11-18  9:53       ` Richard Biener
  2014-11-24 11:53     ` Tom de Vries
  1 sibling, 1 reply; 71+ messages in thread
From: Eric Botcazou @ 2014-11-18  9:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Tom de Vries, Jakub Jelinek, Thomas Schwinge

> Now - I can see how that is easily confused by the static chain
> being address-taken.  But I also remember that Eric did some
> preparatory work to fix that, for nested functions, that is,
> possibly setting DECL_NONADDRESSABLE_P?  Don't remember exactly.

The preparatory work is DECL_NONLOCAL_FRAME.  The complete patch which does 
something along these lines is attached to PR tree-optimization/54779 (latest 
version, for a 4.9-based compiler).

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-18  9:13     ` Eric Botcazou
@ 2014-11-18  9:53       ` Richard Biener
  2014-11-18 12:20         ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2014-11-18  9:53 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Tom de Vries, Jakub Jelinek, Thomas Schwinge

On Tue, 18 Nov 2014, Eric Botcazou wrote:

> > Now - I can see how that is easily confused by the static chain
> > being address-taken.  But I also remember that Eric did some
> > preparatory work to fix that, for nested functions, that is,
> > possibly setting DECL_NONADDRESSABLE_P?  Don't remember exactly.
> 
> The preparatory work is DECL_NONLOCAL_FRAME.  The complete patch which does 
> something along these lines is attached to PR tree-optimization/54779 (latest 
> version, for a 4.9-based compiler).

Ah, now I remember - this was to be able to optimize away the frame
variable in case the nested function was inlined.

Toms case is somewhat different as I undestand as somehow LIM store
motion doesn't handle indirect frame accesses well enough(?)  So
he intends to load register vars in the frame into registers at the
beginning of the nested function and restore them to the frame on
function exit (this will probably break for recursive calls, but
OMP offloading might be special enough that this is a non-issue there).

So marking the frame decl won't help him here (I thought we might
mark the FIELD_DECLs corresponding to individual vars).  OTOH inside
the nested function accesses to the static chain should be easy to
identify.

Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-18  9:53       ` Richard Biener
@ 2014-11-18 12:20         ` Richard Biener
  0 siblings, 0 replies; 71+ messages in thread
From: Richard Biener @ 2014-11-18 12:20 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Tom de Vries, Jakub Jelinek, Thomas Schwinge

On Tue, 18 Nov 2014, Richard Biener wrote:

> On Tue, 18 Nov 2014, Eric Botcazou wrote:
> 
> > > Now - I can see how that is easily confused by the static chain
> > > being address-taken.  But I also remember that Eric did some
> > > preparatory work to fix that, for nested functions, that is,
> > > possibly setting DECL_NONADDRESSABLE_P?  Don't remember exactly.
> > 
> > The preparatory work is DECL_NONLOCAL_FRAME.  The complete patch which does 
> > something along these lines is attached to PR tree-optimization/54779 (latest 
> > version, for a 4.9-based compiler).
> 
> Ah, now I remember - this was to be able to optimize away the frame
> variable in case the nested function was inlined.
> 
> Toms case is somewhat different as I undestand as somehow LIM store
> motion doesn't handle indirect frame accesses well enough(?)  So
> he intends to load register vars in the frame into registers at the
> beginning of the nested function and restore them to the frame on
> function exit (this will probably break for recursive calls, but
> OMP offloading might be special enough that this is a non-issue there).
> 
> So marking the frame decl won't help him here (I thought we might
> mark the FIELD_DECLs corresponding to individual vars).  OTOH inside
> the nested function accesses to the static chain should be easy to
> identify.

Tom - does the following patch help?

Thanks,
Richard.

Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 217692)
+++ gcc/omp-low.c	(working copy)
@@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct
       layout_type (type);
     }
 
-  TREE_TYPE (ctx->receiver_decl) = build_pointer_type (type);
+  TREE_TYPE (ctx->receiver_decl)
+    = build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
 }
 
 /* Instantiate decls as necessary in CTX to satisfy the data sharing

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: openacc kernels directive -- initial support
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (7 preceding siblings ...)
  2014-11-15 19:04 ` [PATCH, 8/8] Do simple omp lowering for no address taken var Tom de Vries
@ 2014-11-19 20:34 ` Tom de Vries
  2015-04-21 19:27 ` Add BUILT_IN_GOACC_KERNELS_INTERNAL (was: openacc kernels directive -- initial support) Thomas Schwinge
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 71+ messages in thread
From: Tom de Vries @ 2014-11-19 20:34 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge, Bernd Schmidt

On 15-11-14 13:14, Tom de Vries wrote:
>   Don't allow flto-partition=balance for fopenacc
>     Unsubmitted. This works around a compilation problem for
>     libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c that I ran into on
>     our internal dev branch.  I'll investigate whether I can reproduce with
>     gomp-4_0-branch asap.

I managed to reproduce this problem with the gomp-4_0-branch. Filed as: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63979 .

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias
  2014-11-15 17:21 ` [PATCH, 1/8] Expand oacc kernels after pass_build_ealias Tom de Vries
@ 2014-11-24 11:29   ` Tom de Vries
  2014-11-25 11:30     ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-24 11:29 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]

On 15-11-14 18:19, Tom de Vries wrote:
> On 15-11-14 13:14, Tom de Vries wrote:
>> Hi,
>>
>> I'm submitting a patch series with initial support for the oacc kernels
>> directive.
>>
>> The patch series uses pass_parallelize_loops to implement parallelization of
>> loops in the oacc kernels region.
>>
>> The patch series consists of these 8 patches:
>> ...
>>      1  Expand oacc kernels after pass_build_ealias
>>      2  Add pass_oacc_kernels
>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>      5  Add pass_loop_im to pass_oacc_kernels
>>      6  Add pass_ccp to pass_oacc_kernels
>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>      8  Do simple omp lowering for no address taken var
>> ...
>
> This patch moves omp expansion of the oacc kernels directive to after
> pass_build_ealias.
>
> The rationale is that in order to use pass_parallelize_loops for analysis and
> transformation of an oacc kernels region, we postpone omp expansion of that
> region until the earliest point in the pass list where enough information is
> availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.
>
> The patch postpones expansion in expand_omp, and ensures expansion by adding
> pass_expand_omp_ssa:
> - after pass_build_ealias, and
> - after pass_all_early_optimizations for the case we're not optimizing.
>
> In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
> lowered omp code, to handle it conservatively.
>
> The patch contains changes in expand_omp_target to deal with ssa-code, similar
> to what is already present in expand_omp_taskreg.
>
> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
> static for oacc kernels. It does this to get some references to .omp_data_sizes
> and .omp_data_kinds in the ssa code.  Without these references, the definitions
> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
> enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
> kludge for this purpose ].
>
> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
> original function of which the definition has been removed (as in moved to the
> split off function). TODO_remove_unused_locals takes care of some of them, but
> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
> dangling SSA_NAMEs and releases them.
>

Reposting with small update: I've replaced the use of the rather generic 
gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.

Bootstrapped and reg-tested in the same way as before.

> OK for trunk?
>
> Thanks,
> - Tom



[-- Attachment #2: 0001-Expand-oacc-kernels-after-pass_build_ealias.patch --]
[-- Type: text/x-patch, Size: 16147 bytes --]

2014-11-14  Tom de Vries  <tom@codesourcery.com>

	* function.h (struct function): Add contains_oacc_kernels field.
	* gimplify.c (gimplify_omp_workshare): Set contains_oacc_kernels.
	* omp-low.c: Include gimple-pretty-print.h.
	(release_first_vuse_in_edge_dest): New function.
	(expand_omp_target): Handle ssa-code.
	(expand_omp): Don't expand GIMPLE_OACC_KERNELS when not in ssa.
	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
	properties_provided field.
	(pass_expand_omp::execute): Set PROP_gimple_eomp in
	cfun->curr_properties only if cfun does not contain oacc kernels.
	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
	todo_flags_finish field.
	(pass_expand_omp_ssa::execute): Release dandging SSA_NAMEs after calling
	execute_expand_omp.
	(lower_omp_target): Add static_arrays variable, init to 1.  Don't use
	static arrays for kernels directive.  Use static_arrays variable.
	Handle case that .omp_data_kinds is not static.
	(gimple_stmt_ssa_operand_references_var_p)
	(gimple_stmt_omp_data_i_init_p): New function.
	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
	* passes.def: Add pass_expand_omp_ssa after pass_build_ealias.
	* tree-ssa-ccp.c: Include omp-low.h.
	(surely_varying_stmt_p, ccp_visit_stmt): Handle omp lowering code
	conservatively.
	* tree-ssa-forwprop.c: Include omp-low.h.
	(pass_forwprop::execute): Handle omp lowering code conservatively.
---
 gcc/function.h          |   3 +
 gcc/gimplify.c          |   1 +
 gcc/omp-low.c           | 196 +++++++++++++++++++++++++++++++++++++++++++++---
 gcc/omp-low.h           |   1 +
 gcc/passes.def          |   2 +
 gcc/tree-ssa-ccp.c      |   6 ++
 gcc/tree-ssa-forwprop.c |   4 +-
 7 files changed, 200 insertions(+), 13 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index 3a6305c..bb48775 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -667,6 +667,9 @@ struct GTY(()) function {
 
   /* Set when the tail call has been identified.  */
   unsigned int tail_call_marked : 1;
+
+  /* Set when the function contains oacc kernels directives.  */
+  unsigned int contains_oacc_kernels : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ad48d51..c40f20f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7316,6 +7316,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
       break;
     case OACC_KERNELS:
       stmt = gimple_build_oacc_kernels (body, OACC_KERNELS_CLAUSES (expr));
+      cfun->contains_oacc_kernels = 1;
       break;
     case OACC_PARALLEL:
       stmt = gimple_build_oacc_parallel (body, OACC_PARALLEL_CLAUSES (expr));
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c503cc1..767fa87 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -88,6 +88,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "cilk.h"
 #include "lto-section-names.h"
+#include "gimple-pretty-print.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -5338,6 +5339,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
     }
 }
 
+static void
+release_first_vuse_in_edge_dest (edge e)
+{
+  gimple_stmt_iterator i;
+  basic_block bb = e->dest;
+
+  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
+    {
+      gimple phi = gsi_stmt (i);
+      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+
+      if (!virtual_operand_p (arg))
+	continue;
+
+      mark_virtual_operand_for_renaming (arg);
+      return;
+    }
+
+  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
+    {
+      gimple stmt = gsi_stmt (i);
+      if (gimple_vuse (stmt) == NULL_TREE)
+	continue;
+
+      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
+      return;
+    }
+}
+
 /* Expand the OpenMP parallel or task directive starting at REGION.  */
 
 static void
@@ -8832,7 +8862,6 @@ expand_omp_target (struct omp_region *region)
   /* Supported by expand_omp_taskreg, but not here.  */
   if (child_cfun != NULL)
     gcc_assert (!child_cfun->cfg);
-  gcc_assert (!gimple_in_ssa_p (cfun));
 
   entry_bb = region->entry;
   exit_bb = region->exit;
@@ -8858,7 +8887,7 @@ expand_omp_target (struct omp_region *region)
 	{
 	  basic_block entry_succ_bb = single_succ (entry_bb);
 	  gimple_stmt_iterator gsi;
-	  tree arg;
+	  tree arg, narg;
 	  gimple tgtcopy_stmt = NULL;
 	  tree sender = TREE_VEC_ELT (gimple_omp_data_arg (entry_stmt), 0);
 
@@ -8888,8 +8917,27 @@ expand_omp_target (struct omp_region *region)
 	  gcc_assert (tgtcopy_stmt != NULL);
 	  arg = DECL_ARGUMENTS (child_fn);
 
-	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
-	  gsi_remove (&gsi, true);
+	  if (!gimple_in_ssa_p (cfun))
+	    {
+	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
+	      gsi_remove (&gsi, true);
+	    }
+	  else
+	    {
+	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
+			  == arg);
+
+	      /* If we are in ssa form, we must load the value from the default
+		 definition of the argument.  That should not be defined now,
+		 since the argument is not used uninitialized.  */
+	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
+	      narg = make_ssa_name (arg, gimple_build_nop ());
+	      set_ssa_default_def (cfun, arg, narg);
+	      /* ?? Is setting the subcode really necessary ??  */
+	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
+	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
+	      update_stmt (tgtcopy_stmt);
+	    }
 	}
 
       /* Declare local variables needed in CHILD_CFUN.  */
@@ -8932,11 +8980,23 @@ expand_omp_target (struct omp_region *region)
 	  stmt = gimple_build_return (NULL);
 	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
 	  gsi_remove (&gsi, true);
+
+	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
+	     which is about to be split off.  Mark the vdef for renaming.  */
+	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
 	}
 
       /* Move the offloading region into CHILD_CFUN.  */
 
-      block = gimple_block (entry_stmt);
+      if (gimple_in_ssa_p (cfun))
+	{
+	  init_tree_ssa (child_cfun);
+	  init_ssa_operands (child_cfun);
+	  child_cfun->gimple_df->in_ssa_p = true;
+	  block = NULL_TREE;
+	}
+      else
+	block = gimple_block (entry_stmt);
 
       new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
       if (exit_bb)
@@ -8986,6 +9046,8 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+      if (gimple_in_ssa_p (cfun))
+	update_ssa (TODO_update_ssa);
       pop_cfun ();
     }
 
@@ -9262,6 +9324,8 @@ expand_omp_target (struct omp_region *region)
       gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
       gsi_remove (&gsi, true);
     }
+  if (gimple_in_ssa_p (cfun))
+    update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 
@@ -9332,6 +9396,15 @@ expand_omp (struct omp_region *region)
 	  break;
 
 	case GIMPLE_OACC_KERNELS:
+	  if (!gimple_in_ssa_p (cfun))
+	    /* We're in pass_expand_omp.  Postpone expanding till
+	       pass_expand_omp_ssa.  */
+	    break;
+
+	  /* We're in pass_expand_omp_ssa.  Expand now.  */
+
+	  /* FALLTHRU.  */
+
 	case GIMPLE_OACC_PARALLEL:
 	case GIMPLE_OMP_TARGET:
 	  expand_omp_target (region);
@@ -9504,7 +9577,7 @@ const pass_data pass_data_expand_omp =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
-  PROP_gimple_eomp, /* properties_provided */
+  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
   0, /* todo_flags_finish */
@@ -9518,7 +9591,7 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual unsigned int execute (function *)
+  virtual unsigned int execute (function *fun)
     {
       bool gate = ((flag_openacc != 0 || flag_openmp != 0
 		    || flag_openmp_simd != 0 || flag_cilkplus != 0)
@@ -9529,7 +9602,12 @@ public:
       if (!gate)
 	return 0;
 
-      return execute_expand_omp ();
+      unsigned int res = execute_expand_omp ();
+
+      if (!cfun->contains_oacc_kernels)
+	fun->curr_properties |= PROP_gimple_eomp;
+
+      return res;
     }
 
 }; // class pass_expand_omp
@@ -9554,7 +9632,8 @@ const pass_data pass_data_expand_omp_ssa =
   PROP_gimple_eomp, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_rebuild_alias
+  | TODO_remove_unused_locals, /* todo_flags_finish */
 };
 
 class pass_expand_omp_ssa : public gimple_opt_pass
@@ -9569,7 +9648,47 @@ public:
     {
       return !(fun->curr_properties & PROP_gimple_eomp);
     }
-  virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  virtual unsigned int execute (function *)
+    {
+      unsigned res = execute_expand_omp ();
+
+      /* After running pass_expand_omp_ssa to expand the oacc kernels
+	 directive, we are left in the original function with anonymous
+	 SSA_NAMEs, with a defining statement that has been deleted.  This
+	 pass finds those SSA_NAMEs and releases them.  */
+      unsigned int i;
+      for (i = 1; i < num_ssa_names; ++i)
+	{
+	  tree name = ssa_name (i);
+	  if (name == NULL_TREE)
+	    continue;
+
+	  gimple stmt = SSA_NAME_DEF_STMT (name);
+	  bool found = false;
+
+	  ssa_op_iter op_iter;
+	  def_operand_p def_p;
+	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	    {
+	      tree def = DEF_FROM_PTR (def_p);
+	      if (def == name)
+		{
+		  found = true;
+		  break;
+		}
+	    }
+
+	  if (!found)
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Released dangling ssa name %u\n", i);
+	      release_ssa_name (name);
+	    }
+	}
+
+      return res;
+    }
+  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
@@ -11195,6 +11314,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   unsigned int map_cnt = 0;
   tree (*gimple_omp_clauses) (const_gimple);
   void (*gimple_omp_set_data_arg) (gimple, tree);
+  unsigned int static_arrays = 1;
 
   offloaded = is_gimple_omp_offloaded (stmt);
   data_region = false;
@@ -11203,6 +11323,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GIMPLE_OACC_KERNELS:
       gimple_omp_clauses = gimple_oacc_kernels_clauses;
       gimple_omp_set_data_arg = gimple_oacc_kernels_set_data_arg;
+      static_arrays = 0;
       break;
     case GIMPLE_OACC_PARALLEL:
       gimple_omp_clauses = gimple_oacc_parallel_clauses;
@@ -11369,7 +11490,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			  ".omp_data_sizes");
       DECL_NAMELESS (TREE_VEC_ELT (t, 1)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 1)) = 1;
-      TREE_STATIC (TREE_VEC_ELT (t, 1)) = 1;
+      TREE_STATIC (TREE_VEC_ELT (t, 1)) = static_arrays;
       tree tkind_type;
       int talign_shift;
       if (is_gimple_omp_oacc_specifically (stmt))
@@ -11387,7 +11508,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			  ".omp_data_kinds");
       DECL_NAMELESS (TREE_VEC_ELT (t, 2)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 2)) = 1;
-      TREE_STATIC (TREE_VEC_ELT (t, 2)) = 1;
+      TREE_STATIC (TREE_VEC_ELT (t, 2)) = static_arrays;
       gimple_omp_set_data_arg (stmt, t);
 
       vec<constructor_elt, va_gc> *vsize;
@@ -11560,6 +11681,22 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						    clobber));
 	}
 
+      if (!TREE_STATIC (TREE_VEC_ELT (t, 2)))
+	{
+	  gimple_seq initlist = NULL;
+	  force_gimple_operand (build1 (DECL_EXPR, void_type_node,
+					TREE_VEC_ELT (t, 2)),
+				&initlist, true, NULL_TREE);
+	  gimple_seq_add_seq (&ilist, initlist);
+
+	  tree clobber = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 2)),
+					    NULL);
+	  TREE_THIS_VOLATILE (clobber) = 1;
+	  gimple_seq_add_stmt (&olist,
+			       gimple_build_assign (TREE_VEC_ELT (t, 2),
+						    clobber));
+	}
+
       tree clobber = build_constructor (ctx->record_type, NULL);
       TREE_THIS_VOLATILE (clobber) = 1;
       gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl,
@@ -13740,4 +13877,39 @@ omp_finish_file (void)
     }
 }
 
+static bool
+gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
+					  unsigned int nr_varnames,
+					  unsigned int flags)
+{
+  tree use;
+  ssa_op_iter iter;
+  const char *s;
+
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
+    {
+      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
+	continue;
+      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
+
+      unsigned int i;
+      for (i = 0; i < nr_varnames; ++i)
+	if (strcmp (varnames[i], s) == 0)
+	  return true;
+    }
+
+  return false;
+}
+
+/* Return true if STMT is .omp_data_i init.  */
+
+bool
+gimple_stmt_omp_data_i_init_p (gimple stmt)
+{
+  const char *varnames[] = { ".omp_data_i" };
+  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
+  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
+						   SSA_OP_DEF);
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index ac587d0..32076e4 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern bool gimple_stmt_omp_data_i_init_p (gimple);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git a/gcc/passes.def b/gcc/passes.def
index ebd2b95..dc45e3f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
+	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
@@ -99,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
 	      late.  */
 	  NEXT_PASS (pass_split_functions);
       POP_INSERT_PASSES ()
+      NEXT_PASS (pass_expand_omp_ssa);
       NEXT_PASS (pass_release_ssa_names);
       NEXT_PASS (pass_rebuild_cgraph_edges);
       NEXT_PASS (pass_inline_parameters);
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 52d8503..23185e6 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -165,6 +165,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "wide-int-print.h"
 #include "builtins.h"
 #include "tree-chkp.h"
+#include "omp-low.h"
 
 
 /* Possible lattice values.  */
@@ -789,6 +790,9 @@ surely_varying_stmt_p (gimple stmt)
       && gimple_code (stmt) != GIMPLE_CALL)
     return true;
 
+  if (gimple_stmt_omp_data_i_init_p (stmt))
+    return true;
+
   return false;
 }
 
@@ -2297,6 +2301,8 @@ ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
   switch (gimple_code (stmt))
     {
       case GIMPLE_ASSIGN:
+	if (gimple_stmt_omp_data_i_init_p (stmt))
+	  break;
         /* If the statement is an assignment that produces a single
            output value, evaluate its RHS to see if the lattice value of
            its output has changed.  */
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index feb8253..860c53e 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -68,6 +68,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-into-ssa.h"
 #include "cfganal.h"
+#include "omp-low.h"
 
 /* This pass propagates the RHS of assignment statements into use
    sites of the LHS of the assignment.  It's basically a specialized
@@ -2244,7 +2245,8 @@ pass_forwprop::execute (function *fun)
 	  tree lhs, rhs;
 	  enum tree_code code;
 
-	  if (!is_gimple_assign (stmt))
+	  if (!is_gimple_assign (stmt)
+	      || gimple_stmt_omp_data_i_init_p (stmt))
 	    {
 	      gsi_next (&gsi);
 	      continue;
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-17 10:29   ` Richard Biener
  2014-11-18  9:13     ` Eric Botcazou
@ 2014-11-24 11:53     ` Tom de Vries
  2014-11-24 11:55       ` Tom de Vries
  2014-11-24 12:40       ` Richard Biener
  1 sibling, 2 replies; 71+ messages in thread
From: Tom de Vries @ 2014-11-24 11:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jakub Jelinek, Thomas Schwinge, ebotcazou

[-- Attachment #1: Type: text/plain, Size: 5018 bytes --]

On 17-11-14 11:13, Richard Biener wrote:
> On Sat, 15 Nov 2014, Tom de Vries wrote:
>
>> >On 15-11-14 13:14, Tom de Vries wrote:
>>> > >Hi,
>>> > >
>>> > >I'm submitting a patch series with initial support for the oacc kernels
>>> > >directive.
>>> > >
>>> > >The patch series uses pass_parallelize_loops to implement parallelization of
>>> > >loops in the oacc kernels region.
>>> > >
>>> > >The patch series consists of these 8 patches:
>>> > >...
>>> > >      1  Expand oacc kernels after pass_build_ealias
>>> > >      2  Add pass_oacc_kernels
>>> > >      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>> > >      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>> > >      5  Add pass_loop_im to pass_oacc_kernels
>>> > >      6  Add pass_ccp to pass_oacc_kernels
>>> > >      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>> > >      8  Do simple omp lowering for no address taken var
>>> > >...
>> >
>> >This patch lowers integer variables that do not have their address taken as
>> >local variable.  We use a copy at region entry and exit to copy the value in
>> >and out.
>> >
>> >In the context of reduction handling in a kernels region, this allows the
>> >parloops reduction analysis to recognize the reduction, even after oacc
>> >lowering has been done in pass_lower_omp.
>> >
>> >In more detail, without this patch, the omp_data_i load and stores are
>> >generated in place (in this case, in the loop):
>> >...
>> >                 {
>> >                   .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
>> >                   {
>> >                     unsigned intD.9 iD.2146;
>> >
>> >                     iD.2146 = 0;
>> >                     goto <D.2207>;
>> >                     <D.2208>:
>> >                     D.2216 = .omp_data_iD.2201->cD.2203;
>> >                     c.9D.2176 = *D.2216;
>> >                     D.2177 = (long unsigned intD.10) iD.2146;
>> >                     D.2178 = D.2177 * 4;
>> >                     D.2179 = c.9D.2176 + D.2178;
>> >                     D.2180 = *D.2179;
>> >                     D.2217 = .omp_data_iD.2201->sumD.2205;
>> >                     D.2218 = *D.2217;
>> >                     D.2217 = .omp_data_iD.2201->sumD.2205;
>> >                     D.2219 = D.2180 + D.2218;
>> >                     *D.2217 = D.2219;
>> >                     iD.2146 = iD.2146 + 1;
>> >                     <D.2207>:
>> >                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>> >                     <D.2209>:
>> >                   }
>> >...
>> >
>> >With this patch, the omp_data_i load and stores for sum are generated at entry
>> >and exit:
>> >...
>> >                 {
>> >                   .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
>> >                   D.2216 = .omp_data_iD.2201->sumD.2205;
>> >                   sumD.2206 = *D.2216;
>> >                   {
>> >                     unsigned intD.9 iD.2146;
>> >
>> >                     iD.2146 = 0;
>> >                     goto <D.2207>;
>> >                     <D.2208>:
>> >                     D.2217 = .omp_data_iD.2201->cD.2203;
>> >                     c.9D.2176 = *D.2217;
>> >                     D.2177 = (long unsigned intD.10) iD.2146;
>> >                     D.2178 = D.2177 * 4;
>> >                     D.2179 = c.9D.2176 + D.2178;
>> >                     D.2180 = *D.2179;
>> >                     sumD.2206 = D.2180 + sumD.2206;
>> >                     iD.2146 = iD.2146 + 1;
>> >                     <D.2207>:
>> >                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>> >                     <D.2209>:
>> >                   }
>> >                   *D.2216 = sumD.2206;
>> >                   #pragma omp return
>> >                 }
>> >...
>> >
>> >
>> >So, without the patch the reduction operation looks like this:
>> >...
>> >     *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) + x
>> >...
>> >
>> >And with this patch the reduction operation is simply:
>> >...
>> >     sumD.2206 = sumD.2206 + x:
>> >...
>> >
>> >OK for trunk?
> I presume the reason you are trying to do that here is that otherwise
> it happens too late?  What you do is what loop store motion would
> do.

Richard,

Thanks for the hint. I've built a reduction example:
...
void __attribute__((noinline))
f (unsigned int *__restrict__ a, unsigned int *__restrict__ sum, unsigned int n)
{
   unsigned int i;
   for (i = 0; i < n; ++i)
     *sum += a[i];
}...
and observed that store motion of the *sum store is done by pass_loop_im, 
provided the *sum load is taken out of the the loop by pass_pre first.

So alternatively, we could use pass_pre and pass_loop_im to achieve the same effect.

When trying out adding pass_pre as a part of the pass group pass_oacc_kernels, I 
found that also pass_copyprop was required to get parloops to recognize the 
reduction.

Attached patch adds the pre pass to pass group pass_oacc_kernels.

Bootstrapped and reg-tested in the same way as before.

OK for trunk?

[-- Attachment #2: 0004-Add-pass_pre-in-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 68418 bytes --]

2014-11-23  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass_split_crit_edges and pass_pre to pass group
	pass_oacc_kernels.
	* tree-ssa-pre.c (pass_pre::clone): New function.
	* tree-ssa-sccvn.c (visit_use):  Handle .omp_data_i init conservatively.
	* tree-ssa-tail-merge.c (tail_merge_optimize): Don't run if omp not
	expanded yet.

	* g++.dg/init/new19.C: Replace pre with pre2.
	* g++.dg/tree-ssa/pr33615-2.C: Same.
	* gcc.dg/pr31847.c: Same.
	* gcc.dg/pr41783.c: Same.
	* gcc.dg/pr43864-2.c: Same.
	* gcc.dg/pr43864-3.c: Same.
	* gcc.dg/pr43864-4.c: Same.
	* gcc.dg/pr43864.c: Same.
	* gcc.dg/pr50763.c: Same.
	* gcc.dg/pr51879-12.c: Same.
	* gcc.dg/pr51879-16.c: Same.
	* gcc.dg/pr51879-17.c: Same.
	* gcc.dg/pr51879-18.c: Same.
	* gcc.dg/pr51879-2.c: Same.
	* gcc.dg/pr51879-3.c: Same.
	* gcc.dg/pr51879-4.c: Same.
	* gcc.dg/pr51879-6.c: Same.
	* gcc.dg/pr51879-7.c: Same.
	* gcc.dg/pr51879.c: Same.
	* gcc.dg/pr58805.c: Same.
	* gcc.dg/tail-merge-store.c: Same.
	* gcc.dg/tree-ssa/loadpre1.c: Same.
	* gcc.dg/tree-ssa/loadpre10.c: Same.
	* gcc.dg/tree-ssa/loadpre11.c: Same.
	* gcc.dg/tree-ssa/loadpre12.c: Same.
	* gcc.dg/tree-ssa/loadpre13.c: Same.
	* gcc.dg/tree-ssa/loadpre14.c: Same.
	* gcc.dg/tree-ssa/loadpre15.c: Same.
	* gcc.dg/tree-ssa/loadpre16.c: Same.
	* gcc.dg/tree-ssa/loadpre17.c: Same.
	* gcc.dg/tree-ssa/loadpre18.c: Same.
	* gcc.dg/tree-ssa/loadpre19.c: Same.
	* gcc.dg/tree-ssa/loadpre2.c: Same.
	* gcc.dg/tree-ssa/loadpre20.c: Same.
	* gcc.dg/tree-ssa/loadpre21.c: Same.
	* gcc.dg/tree-ssa/loadpre22.c: Same.
	* gcc.dg/tree-ssa/loadpre23.c: Same.
	* gcc.dg/tree-ssa/loadpre24.c: Same.
	* gcc.dg/tree-ssa/loadpre25.c: Same.
	* gcc.dg/tree-ssa/loadpre3.c: Same.
	* gcc.dg/tree-ssa/loadpre4.c: Same.
	* gcc.dg/tree-ssa/loadpre5.c: Same.
	* gcc.dg/tree-ssa/loadpre6.c: Same.
	* gcc.dg/tree-ssa/loadpre7.c: Same.
	* gcc.dg/tree-ssa/loadpre8.c: Same.
	* gcc.dg/tree-ssa/pr23455.c: Same.
	* gcc.dg/tree-ssa/pr35286.c: Same.
	* gcc.dg/tree-ssa/pr35287.c: Same.
	* gcc.dg/tree-ssa/pr43491.c: Same.
	* gcc.dg/tree-ssa/pr47392.c: Same.
	* gcc.dg/tree-ssa/ssa-fre-10.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-1.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-11.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-12.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-13.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-16.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-17.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-18.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-19.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-2.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-20.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-21.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-22.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-23.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-24.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-25.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-27.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-28.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-29.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-3.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-30.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-31.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-4.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-5.c: Same.
	* gcc.dg/tree-ssa/ssa-pre-6.c: Same.
	* gfortran.dg/pr43984.f90: Same.
---
 gcc/passes.def                             |  2 ++
 gcc/testsuite/g++.dg/init/new19.C          |  6 +++---
 gcc/testsuite/g++.dg/tree-ssa/pr33615-2.C  |  6 +++---
 gcc/testsuite/gcc.dg/pr31847.c             |  6 +++---
 gcc/testsuite/gcc.dg/pr41783.c             |  8 ++++----
 gcc/testsuite/gcc.dg/pr43864-2.c           | 10 +++++-----
 gcc/testsuite/gcc.dg/pr43864-3.c           | 10 +++++-----
 gcc/testsuite/gcc.dg/pr43864-4.c           | 12 ++++++------
 gcc/testsuite/gcc.dg/pr43864.c             |  8 ++++----
 gcc/testsuite/gcc.dg/pr50763.c             |  6 +++---
 gcc/testsuite/gcc.dg/pr51879-12.c          |  8 ++++----
 gcc/testsuite/gcc.dg/pr51879-16.c          |  8 ++++----
 gcc/testsuite/gcc.dg/pr51879-17.c          |  8 ++++----
 gcc/testsuite/gcc.dg/pr51879-18.c          |  6 +++---
 gcc/testsuite/gcc.dg/pr51879-2.c           |  8 ++++----
 gcc/testsuite/gcc.dg/pr51879-3.c           |  6 +++---
 gcc/testsuite/gcc.dg/pr51879-4.c           |  6 +++---
 gcc/testsuite/gcc.dg/pr51879-6.c           |  6 +++---
 gcc/testsuite/gcc.dg/pr51879-7.c           |  6 +++---
 gcc/testsuite/gcc.dg/pr51879.c             |  6 +++---
 gcc/testsuite/gcc.dg/pr58805.c             |  6 +++---
 gcc/testsuite/gcc.dg/tail-merge-store.c    |  8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/loadpre1.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre10.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre11.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre12.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre13.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre14.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre15.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre16.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre17.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre18.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre19.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre2.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre20.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre21.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre22.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre23.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre24.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre25.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre4.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre5.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre6.c   |  8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/loadpre7.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loadpre8.c   |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/pr23455.c    |  8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/pr35286.c    |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/pr35287.c    |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/pr43491.c    |  8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/pr47392.c    |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-10.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-1.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-12.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-13.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-16.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-18.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-19.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c  |  8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-20.c |  8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-21.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-22.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-23.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-24.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-25.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-29.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-3.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c  |  6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-5.c  |  8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-6.c  |  6 +++---
 gcc/testsuite/gfortran.dg/pr43984.f90      |  6 +++---
 gcc/tree-ssa-pre.c                         |  1 +
 gcc/tree-ssa-sccvn.c                       |  4 +++-
 gcc/tree-ssa-tail-merge.c                  |  3 ++-
 80 files changed, 256 insertions(+), 250 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index e5edc1d..fdff85e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -90,6 +90,8 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
+	      NEXT_PASS (pass_split_crit_edges);
+	      NEXT_PASS (pass_pre);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
diff --git a/gcc/testsuite/g++.dg/init/new19.C b/gcc/testsuite/g++.dg/init/new19.C
index a25be7d..3b3c913 100644
--- a/gcc/testsuite/g++.dg/init/new19.C
+++ b/gcc/testsuite/g++.dg/init/new19.C
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O2 -fstrict-aliasing -fdump-tree-pre-details" }
+// { dg-options "-O2 -fstrict-aliasing -fdump-tree-pre2-details" }
 
 // Make sure we hoist invariants out of the loop even in the presence
 // of placement new.  This is similar to code in tramp3d.
@@ -69,5 +69,5 @@ int c::foo(int f1, int f2, int f3)
   return sum;
 }
 
-// { dg-final { scan-tree-dump "Replaced.*->ai\\\[0\\\]" "pre" } }
-// { dg-final { cleanup-tree-dump "pre" } }
+// { dg-final { scan-tree-dump "Replaced.*->ai\\\[0\\\]" "pre2" } }
+// { dg-final { cleanup-tree-dump "pre2" } }
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr33615-2.C b/gcc/testsuite/g++.dg/tree-ssa/pr33615-2.C
index 542731a..7189733 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr33615-2.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr33615-2.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fnon-call-exceptions -fdump-tree-pre-details -w" } */
+/* { dg-options "-O2 -fnon-call-exceptions -fdump-tree-pre2-details -w" } */
 
 extern volatile int y;
 
@@ -16,5 +16,5 @@ foo (double a, int x)
 
 // The expression 1.0 / 0.0 should not be treated as a loop invariant
 // if it may throw an exception.
-// { dg-final { scan-tree-dump-times "Replaced 1\\\.0e\\\+0 / 0\\\.0" 0 "pre" } }
-// { dg-final { cleanup-tree-dump "pre" } }
+// { dg-final { scan-tree-dump-times "Replaced 1\\\.0e\\\+0 / 0\\\.0" 0 "pre2" } }
+// { dg-final { cleanup-tree-dump "pre2" } }
diff --git a/gcc/testsuite/gcc.dg/pr31847.c b/gcc/testsuite/gcc.dg/pr31847.c
index 4b945a9..484f2b3 100644
--- a/gcc/testsuite/gcc.dg/pr31847.c
+++ b/gcc/testsuite/gcc.dg/pr31847.c
@@ -1,7 +1,7 @@
 /* PR 31847 */
 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-all" } */
+/* { dg-options "-O2 -fdump-tree-pre2-all" } */
 
 extern int bar(int);
 
@@ -11,5 +11,5 @@ int foo()
   return bar(a);
 }
 
-/* { dg-final { scan-tree-dump-not "Created value  for " "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-not "Created value  for " "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr41783.c b/gcc/testsuite/gcc.dg/pr41783.c
index cae066b..02c6af4 100644
--- a/gcc/testsuite/gcc.dg/pr41783.c
+++ b/gcc/testsuite/gcc.dg/pr41783.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-tree-pre" } */
+/* { dg-options "-O3 -fdump-tree-pre2" } */
 int db[100];
 int a_global_var, fact;
 int main()
@@ -15,6 +15,6 @@ int main()
 }
 /* We want to have exactly one load (not two) from a_global_var,
    and we want that load to be into a PRE temporary.  */
-/* { dg-final { scan-tree-dump-times "= a_global_var;" 1 "pre" } } */
-/* { dg-final { scan-tree-dump "pretmp\[^\\n\]* = a_global_var;" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "= a_global_var;" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump "pretmp\[^\\n\]* = a_global_var;" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43864-2.c b/gcc/testsuite/gcc.dg/pr43864-2.c
index f00fff9..a3393e6 100644
--- a/gcc/testsuite/gcc.dg/pr43864-2.c
+++ b/gcc/testsuite/gcc.dg/pr43864-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 int
 f (int c, int b, int d)
@@ -17,7 +17,7 @@ f (int c, int b, int d)
   return r;
 }
 
-/* { dg-final { scan-tree-dump-times "if " 0 "pre"} } */
-/* { dg-final { scan-tree-dump-times "_.*\\\+.*_" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-not "Invalid sum" "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "_.*\\\+.*_" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43864-3.c b/gcc/testsuite/gcc.dg/pr43864-3.c
index c4954e1..721395c 100644
--- a/gcc/testsuite/gcc.dg/pr43864-3.c
+++ b/gcc/testsuite/gcc.dg/pr43864-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 /* Commutative case.  */
 
@@ -18,7 +18,7 @@ int f(int c, int b, int d)
   return r;
 }
 
-/* { dg-final { scan-tree-dump-times "if " 0 "pre"} } */
-/* { dg-final { scan-tree-dump-times "_.*\\\+.*_" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-not "Invalid sum" "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "_.*\\\+.*_" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43864-4.c b/gcc/testsuite/gcc.dg/pr43864-4.c
index 42adfee..2531ab1 100644
--- a/gcc/testsuite/gcc.dg/pr43864-4.c
+++ b/gcc/testsuite/gcc.dg/pr43864-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 /* Different stmt order.  */
 
@@ -22,8 +22,8 @@ int f(int c, int b, int d)
   return r - r2;
 }
 
-/* { dg-final { scan-tree-dump-times "if " 0 "pre"} } */
-/* { dg-final { scan-tree-dump-times "_.*\\\+.*_" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-times " - " 2 "pre"} } */
-/* { dg-final { scan-tree-dump-not "Invalid sum" "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "_.*\\\+.*_" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-times " - " 2 "pre2" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43864.c b/gcc/testsuite/gcc.dg/pr43864.c
index 8d1e989..8800195 100644
--- a/gcc/testsuite/gcc.dg/pr43864.c
+++ b/gcc/testsuite/gcc.dg/pr43864.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 extern void foo (char*, int);
 extern void mysprintf (char *, char *);
@@ -31,6 +31,6 @@ hprofStartupp (char *outputFileName, char *ctx)
   return ctx;
 }
 
-/* { dg-final { scan-tree-dump-times "myfree \\(" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-not "Invalid sum" "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "myfree \\(" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr50763.c b/gcc/testsuite/gcc.dg/pr50763.c
index 695b61c..268b0eb 100644
--- a/gcc/testsuite/gcc.dg/pr50763.c
+++ b/gcc/testsuite/gcc.dg/pr50763.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre" } */
+/* { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre2" } */
 
 int bar (int i);
 
@@ -12,5 +12,5 @@ foo (int c, int d)
   while (c == d);
 }
 
-/* { dg-final { scan-tree-dump-times "== 33" 2 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "== 33" 2 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-12.c b/gcc/testsuite/gcc.dg/pr51879-12.c
index 1b25e29..249ad0f 100644
--- a/gcc/testsuite/gcc.dg/pr51879-12.c
+++ b/gcc/testsuite/gcc.dg/pr51879-12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 __attribute__((pure)) int bar (int);
 __attribute__((pure)) int bar2 (int);
@@ -24,6 +24,6 @@ foo (int y)
   baz (a);
 }
 
-/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-times "bar2 \\(" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "bar2 \\(" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-16.c b/gcc/testsuite/gcc.dg/pr51879-16.c
index 3a84e97..04ab419 100644
--- a/gcc/testsuite/gcc.dg/pr51879-16.c
+++ b/gcc/testsuite/gcc.dg/pr51879-16.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 struct S {
   int i;
@@ -27,6 +27,6 @@ int bar (int c) {
   return r;
 }
 
-/* { dg-final { scan-tree-dump-times "foo \\(" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-times "foo2 \\(" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "foo \\(" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "foo2 \\(" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-17.c b/gcc/testsuite/gcc.dg/pr51879-17.c
index 806fe7b..5480349 100644
--- a/gcc/testsuite/gcc.dg/pr51879-17.c
+++ b/gcc/testsuite/gcc.dg/pr51879-17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 struct S {
   int i;
@@ -27,6 +27,6 @@ int bar (int c) {
   return r;
 }
 
-/* { dg-final { scan-tree-dump-times "foo \\(" 2 "pre"} } */
-/* { dg-final { scan-tree-dump-times "foo2 \\(" 2 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "foo \\(" 2 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "foo2 \\(" 2 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-18.c b/gcc/testsuite/gcc.dg/pr51879-18.c
index 95629f1..01f4995 100644
--- a/gcc/testsuite/gcc.dg/pr51879-18.c
+++ b/gcc/testsuite/gcc.dg/pr51879-18.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre -fno-tree-copy-prop -fno-tree-dominator-opts -fno-tree-copyrename" } */
+/* { dg-options "-O2 -fdump-tree-pre2 -fno-tree-copy-prop -fno-tree-dominator-opts -fno-tree-copyrename" } */
 
 extern int foo (void);
 
@@ -13,5 +13,5 @@ void bar (int c, int *p)
     *q = foo ();
 }
 
-/* { dg-final { scan-tree-dump-times "foo \\(" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "foo \\(" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-2.c b/gcc/testsuite/gcc.dg/pr51879-2.c
index db385cb..b086caa 100644
--- a/gcc/testsuite/gcc.dg/pr51879-2.c
+++ b/gcc/testsuite/gcc.dg/pr51879-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 int bar (int);
 void baz (int);
@@ -14,6 +14,6 @@ foo (int y)
     baz (bar (7) + 6);
 }
 
-/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-times "baz \\(" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "baz \\(" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-3.c b/gcc/testsuite/gcc.dg/pr51879-3.c
index be4b374..c3b4e31 100644
--- a/gcc/testsuite/gcc.dg/pr51879-3.c
+++ b/gcc/testsuite/gcc.dg/pr51879-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 int bar (int);
 void baz (int);
@@ -15,5 +15,5 @@ foo (int y)
   baz (a);
 }
 
-/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-4.c b/gcc/testsuite/gcc.dg/pr51879-4.c
index 5cb47af..51995a7 100644
--- a/gcc/testsuite/gcc.dg/pr51879-4.c
+++ b/gcc/testsuite/gcc.dg/pr51879-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 int bar (int);
 void baz (int);
@@ -12,5 +12,5 @@ int foo (int y)
   return a + b;
 }
 
-/* { dg-final { scan-tree-dump-times "bar \\(" 2 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "bar \\(" 2 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-6.c b/gcc/testsuite/gcc.dg/pr51879-6.c
index 8362a17..5daffd0 100644
--- a/gcc/testsuite/gcc.dg/pr51879-6.c
+++ b/gcc/testsuite/gcc.dg/pr51879-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 
 int bar (int);
@@ -23,5 +23,5 @@ foo (int y)
   baz (a);
 }
 
-/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879-7.c b/gcc/testsuite/gcc.dg/pr51879-7.c
index 8a699a1..e8872a2 100644
--- a/gcc/testsuite/gcc.dg/pr51879-7.c
+++ b/gcc/testsuite/gcc.dg/pr51879-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 int z;
 
@@ -12,5 +12,5 @@ foo (int y)
     z = 5;
 }
 
-/* { dg-final { scan-tree-dump-times "z = 5" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "z = 5" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr51879.c b/gcc/testsuite/gcc.dg/pr51879.c
index 060624f..f806e8e 100644
--- a/gcc/testsuite/gcc.dg/pr51879.c
+++ b/gcc/testsuite/gcc.dg/pr51879.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 int bar (int);
 void baz (int);
@@ -15,5 +15,5 @@ foo (int y)
   baz (a);
 }
 
-/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/pr58805.c b/gcc/testsuite/gcc.dg/pr58805.c
index dda0e4b..cfa0219 100644
--- a/gcc/testsuite/gcc.dg/pr58805.c
+++ b/gcc/testsuite/gcc.dg/pr58805.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-tail-merge -fdump-tree-pre" } */
+/* { dg-options "-O2 -ftree-tail-merge -fdump-tree-pre2" } */
 
 /* Type that matches the 'p' constraint.  */
 #define TYPE void *
@@ -20,5 +20,5 @@ foo (int n, TYPE *x, TYPE *y)
     bar (y);
 }
 
-/* { dg-final { scan-tree-dump-times "__asm__" 2 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "__asm__" 2 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tail-merge-store.c b/gcc/testsuite/gcc.dg/tail-merge-store.c
index 1aefbdc..ec7aa8c 100644
--- a/gcc/testsuite/gcc.dg/tail-merge-store.c
+++ b/gcc/testsuite/gcc.dg/tail-merge-store.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-tail-merge -fdump-tree-pre" } */
+/* { dg-options "-O2 -ftree-tail-merge -fdump-tree-pre2" } */
 
 int z;
 int x;
@@ -17,6 +17,6 @@ f (int c, int d)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "duplicate of" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-times "z = 5" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "duplicate of" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "z = 5" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre1.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre1.c
index ce78f02..41254bb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int foo(int *a, int argc)
 {
   int c;
@@ -14,5 +14,5 @@ int foo(int *a, int argc)
   e = *a;
   return d + e;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre10.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre10.c
index 4147a70..e55273b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 struct tree_common 
 { 
   int code; 
@@ -44,6 +44,6 @@ L19:
 L23: 
   return expr; 
 } 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre11.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre11.c
index eb6089c..4416aa0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats -fno-tree-cselim" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats -fno-tree-cselim" } */
 int *t;
 int g(int);
 int f(int tt)
@@ -9,6 +9,6 @@ int f(int tt)
       *t1 = 2;
     return g(*t1);
 } 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre12.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre12.c
index 94a3d00..837585a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre12.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 type *t;
 int g(int);
@@ -11,5 +11,5 @@ int f(int tt)
     return g((*t1)[0]);
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre13.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre13.c
index 420ad71..efd43f8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre13.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int t[2];
 int g(int);
 int f(int tt)
@@ -9,5 +9,5 @@ int f(int tt)
     return g(t[0]);
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre14.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre14.c
index 11bfd00..79bb36a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int foo(type *a, int argc)
 {
@@ -15,5 +15,5 @@ int foo(type *a, int argc)
   e = (*a)[0];
   return d + e;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre15.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre15.c
index b04c762..652b0f3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre15.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre15.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int foo(type *a, int argc, int t)
 {
@@ -15,5 +15,5 @@ int foo(type *a, int argc, int t)
   e = (*a)[t];
   return d + e;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre16.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre16.c
index 193ae52..f7fe24a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre16.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int foo(type *a, int argc)
 {
@@ -12,5 +12,5 @@ int foo(type *a, int argc)
   e = (*a)[0];
   return d + e;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre17.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre17.c
index ec0f6ec..38b72af 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int foo(type *a, int argc)
 {
@@ -12,5 +12,5 @@ int foo(type *a, int argc)
   e = (*a)[argc];
   return d + e;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre18.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre18.c
index 21a1d06..07133e4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre18.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int main(type *a, int argc)
 {
@@ -12,5 +12,5 @@ int main(type *a, int argc)
   e = (*a)[argc];
   return d + e;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" { xfail *-*-* } } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre19.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre19.c
index 0ad8988..65294c0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre19.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre19.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int foo(type *a, int argc)
 {
@@ -12,5 +12,5 @@ int foo(type *a, int argc)
   e = (*a)[argc];
   return e;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"  } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre2.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre2.c
index 8d6557a..6a2fde4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int main(int *a, int argc)
 {
   int b;
@@ -14,5 +14,5 @@ int main(int *a, int argc)
   return d + e;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre20.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre20.c
index 92a2353..3a657c7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre20.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre20.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int foo(type *a, int argc)
 {
@@ -12,5 +12,5 @@ int foo(type *a, int argc)
   e = (*a)[argc];
   return e;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"  } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre21.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre21.c
index 77caef6..2f53186 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre21.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre21.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int main(type *a, int argc)
 {
@@ -15,5 +15,5 @@ int main(type *a, int argc)
   return d + e;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre22.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre22.c
index 3c03c9b..2b982df 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre22.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre22.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 typedef int type[2];
 int main(type *a, int argc)
 {
@@ -15,5 +15,5 @@ int main(type *a, int argc)
   return d + e;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre23.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre23.c
index 2273acc..f95f63c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre23.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre23.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 struct {
   int a;
@@ -21,5 +21,5 @@ int foo(int argc)
   return d + e;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"  } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre24.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre24.c
index 31fcc9f..b28eb15 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre24.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre24.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 int a;
 
@@ -20,5 +20,5 @@ int foo(int argc)
 
 /* We will move the load of a out of the loop.  */
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre25.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre25.c
index aaf0931..8c542e8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre25.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre25.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 struct X { int i; };
 int foo(struct X *a, int argc)
 {
@@ -16,5 +16,5 @@ int foo(struct X *a, int argc)
   return d + e;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"  } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c
index 4bda8f6..2815656 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int foo(int **a,int argc)
 {
   int b;
@@ -20,5 +20,5 @@ int foo(int **a,int argc)
   return d + e;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre4.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre4.c
index 1e26603..1e0f40d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int main(int *a, int argc)
 {
   int b;
@@ -15,5 +15,5 @@ int main(int *a, int argc)
   return d + e;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"  } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre5.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre5.c
index 475050a..ba3b199 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int p;
 int r;
 
@@ -20,5 +20,5 @@ int foo(int argc)
     r = 9;
   return q + a();
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre6.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre6.c
index bcd72c5..abcafcf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target nonpic } } */
-/* { dg-options "-O2 -fdump-tree-pre-stats -fdump-tree-fre1" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats -fdump-tree-fre1" } */
 #include <stddef.h>
 
 union tree_node;
@@ -74,7 +74,7 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-not "= unexpanded_var_list;" "fre1" } } */
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" } } */
-/* { dg-final { scan-tree-dump-times "Insertions: 2" 1 "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "Insertions: 2" 1 "pre2" } } */
 /* { dg-final { cleanup-tree-dump "fre1" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre7.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre7.c
index 7e67c9d..a630dfc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 /* We can't eliminate the *p load here in any sane way, as eshup8 may 
    change it.  */
 
@@ -16,5 +16,5 @@ enormlz (x)
       eshup8 (x);
     }
 }
-/* { dg-final { scan-tree-dump-not "Eliminated:" "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-not "Eliminated:" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loadpre8.c b/gcc/testsuite/gcc.dg/tree-ssa/loadpre8.c
index 0dfc2b0..b6e3a09 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loadpre8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loadpre8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats -std=gnu89" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats -std=gnu89" } */
 typedef union tree_node *tree;
 struct tree_common
 {
@@ -93,5 +93,5 @@ rewrite_add_phi_arguments (basic_block bb)
 	  get_reaching_def ((get_def_from_ptr (get_phi_result_ptr (phi)))->ssa_name.var);
     }
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr23455.c b/gcc/testsuite/gcc.dg/tree-ssa/pr23455.c
index 6522f99..3eee625 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr23455.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr23455.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 #ifdef _WIN64
 #define LONG long long
 #else
@@ -25,6 +25,6 @@ bi_windup(unsigned int *outbuf, unsigned int bi_buf)
 /* We should eliminate one load of outcnt, which will in turn let us eliminate
    one multiply of outcnt which will in turn let us eliminate
    one add involving outcnt and outbuf.  */
-/* { dg-final { scan-tree-dump-times "Eliminated: 3" 1 "pre" {target { ! avr-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "Eliminated: 4" 1 "pre" {target {   avr-*-* } } } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 3" 1 "pre2" {target { ! avr-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 4" 1 "pre2" {target {   avr-*-* } } } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c b/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
index 8601cab..e88477c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int g2;
 struct A {
     int a; int b;
@@ -19,5 +19,5 @@ int foo(int a, int b)
 }
 /* We will eliminate the g1.a from the return statement as fully redundant,
    and remove one calculation of a + b. */
-/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr35287.c b/gcc/testsuite/gcc.dg/tree-ssa/pr35287.c
index 1e97662..a7d266b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr35287.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr35287.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int *gp;
 int foo(int p)
 {
@@ -11,5 +11,5 @@ int foo(int p)
 }
 
 /* We will eliminate one load of gp and one indirect load of *gp. */
-/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr43491.c b/gcc/testsuite/gcc.dg/tree-ssa/pr43491.c
index 44dc5f2..fea54e5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr43491.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr43491.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 #define REGISTER register
 
@@ -37,6 +37,6 @@ long foo(long data, long v)
 }
 /* We should not eliminate global register variable when it is the RHS of
    a single assignment.  */
-/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "Eliminated: 3" 1 "pre" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre2" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 3" 1 "pre2" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr47392.c b/gcc/testsuite/gcc.dg/tree-ssa/pr47392.c
index 2016136..bf056b6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr47392.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr47392.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 struct A
 {
@@ -38,5 +38,5 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Eliminated: 1" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Eliminated: 1" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-10.c
index 34217a0..fe3539f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 union loc {  unsigned reg; signed offset; };
 void __frame_state_for (volatile char *state_in, int x)
@@ -22,5 +22,5 @@ void __frame_state_for (volatile char *state_in, int x)
    invariants and the volatileness of state_in prevents DSE of the
    first store.  Thus, this is XFAILed.  */
 
-/* { dg-final { scan-tree-dump "Insertions: 2" "pre" { xfail *-*-* } } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Insertions: 2" "pre2" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-1.c
index 3bc0f5e..0734230 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 extern int printf (const char *, ...);
 int foo(int argc, char **argv)
 {
@@ -17,5 +17,5 @@ int foo(int argc, char **argv)
 }
 /* We should eliminate one evaluation of b + c along the main path,
    causing one reload. */
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c
index 26c47b1..f92e2d8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 double cos (double);
 double f(double a)
 {
@@ -17,5 +17,5 @@ double f(double a)
  return d + c;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-12.c
index fd80e3d..f191ab2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-12.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 double cos (double) __attribute__ ((const));
 double sin (double) __attribute__ ((const));
 double f(double a)
@@ -23,5 +23,5 @@ double f(double a)
   return d + c;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-13.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-13.c
index dfce46b..69f7615 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-13.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 double cos (double) __attribute__ ((const));
 double sin (double) __attribute__ ((const));
 double f(double a)
@@ -22,5 +22,5 @@ double f(double a)
   return d + c;
 }
 
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-16.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-16.c
index b087dc1..78a0e76 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-16.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats -std=c99" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats -std=c99" } */
 int foo(int k, int *x)
 {
   int j=0;
@@ -11,5 +11,5 @@ int foo(int k, int *x)
   }  while (++j<k);
   return res;
 }
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c
index d4274db..6920254 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 typedef union {
   int i;
@@ -14,5 +14,5 @@ int foo(U *u, int b, int i)
   return u->i;
 }
 
-/* { dg-final { scan-tree-dump "Eliminated: 1" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Eliminated: 1" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-18.c
index 5e92934..4de28b4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-18.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-details" } */
+/* { dg-options "-O2 -fdump-tree-pre2-details" } */
 
 struct Bar { int a; int b; };
 struct Foo { int x; struct Bar y; };
@@ -17,5 +17,5 @@ int bar (int b)
   return c;
 }
 
-/* { dg-final { scan-tree-dump "Replaced foo \\(f.y\\)" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Replaced foo \\(f.y\\)" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-19.c
index 0fd0dc5..cf53f2e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-19.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-19.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 struct Loc {
     int x[3];
@@ -35,5 +35,5 @@ int foo (int i, int j, int k, int b)
 }
 
 /* All three loads should be eliminated.  */
-/* { dg-final { scan-tree-dump "Eliminated: 3" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Eliminated: 3" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c
index 311f127..9dcb621 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int motion_test1(int data, int data_0, int data_3, int v)
 {
 	int i;
@@ -19,6 +19,6 @@ int motion_test1(int data, int data_0, int data_3, int v)
    main path.  We cannot re-associate v * t * u due to undefined
    signed overflow so we do not eliminate one computation of v * i along
    the main path. */
-/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre2" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-20.c
index 6361b67..720e050 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-20.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-20.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 double pcheck;
 
@@ -30,6 +30,6 @@ bb18:
 /* We should have inserted two PHI nodes and the one in the i-loop
    should have 0.0 in the argument coming from the bb18 block.  */
 
-/* { dg-final { scan-tree-dump "New PHIs: 2" "pre" } } */
-/* { dg-final { scan-tree-dump "PHI <.*0\\\.0" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "New PHIs: 2" "pre2" } } */
+/* { dg-final { scan-tree-dump "PHI <.*0\\\.0" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-21.c
index 40bb421..abcbd64 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-21.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-21.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 long
 NumSift (long *array, unsigned long k)
@@ -11,5 +11,5 @@ NumSift (long *array, unsigned long k)
 
 /* There should be only two loads left.  */
 
-/* { dg-final { scan-tree-dump-times "= \\\*\[^\n;\]*;" 2 "pre" { xfail { ! size32plus } } } } */ /* xfail: PR tree-optimization/58169 */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "= \\\*\[^\n;\]*;" 2 "pre2" { xfail { ! size32plus } } } } */ /* xfail: PR tree-optimization/58169 */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-22.c
index 3a1697e..190a23f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-22.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-22.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 int foo (int i, int b)
 {
@@ -9,5 +9,5 @@ int foo (int i, int b)
   return j - i;
 }
 
-/* { dg-final { scan-tree-dump "Eliminated: 1" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Eliminated: 1" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-23.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-23.c
index 6aeb06a..4b840f8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-23.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-23.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 struct { int x; int y; } global;
 void foo(int n)
@@ -9,5 +9,5 @@ void foo(int n)
     global.y += global.x*global.x;
 }
 
-/* { dg-final { scan-tree-dump "Eliminated: 3" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Eliminated: 3" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-24.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-24.c
index f91f4af..b967b60 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-24.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-24.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 void foo(int *p, double *x, int n)
 {
@@ -11,5 +11,5 @@ void foo(int *p, double *x, int n)
 /* We should remove the unnecessary insertion of a phi-node and
    _not_ end up using the phi result for replacement *p.  */
 
-/* { dg-final { scan-tree-dump-not "= prephitmp" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-not "= prephitmp" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-25.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-25.c
index 32b0682..8d8e200 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-25.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-25.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 
 struct X { int i; };
 
@@ -19,5 +19,5 @@ int foo (int x)
 
 /* We should eliminate the load from p for a PHI node with values 1 and 2.  */
 
-/* { dg-final { scan-tree-dump "Eliminated: 1" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Eliminated: 1" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c
index 4149bbe..d57b735 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 int foo (int i, int j, int b)
 {
@@ -24,5 +24,5 @@ int foo2 (int i, int j, int b)
   return res;
 }
 
-/* { dg-final { scan-tree-dump-times "# prephitmp" 2 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "# prephitmp" 2 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c
index 55887a6..a323164 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c
@@ -1,6 +1,6 @@
 /* PR37997 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-details" } */
+/* { dg-options "-O2 -fdump-tree-pre2-details" } */
 
 int foo (int i, int b, int result)
 {
@@ -17,5 +17,5 @@ int foo (int i, int b, int result)
 /* We should insert i + 1 into the if (b) path as well as the simplified
    i + 1 & -2 expression.  And do replacement with two PHI temps.  */
 
-/* { dg-final { scan-tree-dump-times "with prephitmp" 2 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "with prephitmp" 2 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-29.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-29.c
index b70fa58..f13ad8a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-29.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-29.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-details" } */
+/* { dg-options "-O2 -fdump-tree-pre2-details" } */
 
 void bark (void);
 int flag, hoist, y, z;
@@ -18,5 +18,5 @@ foo (void)
 /* We should see the partial redundancy of hoist + 4, not being confused
    about bark () possibly clobbering hoist.  */
 
-/* { dg-final { scan-tree-dump "Replaced hoist" "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump "Replaced hoist" "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-3.c
index 3925f75..a741ced 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 unsigned foo1 (unsigned a, unsigned b)
 {
   unsigned i, j, k;
@@ -11,5 +11,5 @@ unsigned foo1 (unsigned a, unsigned b)
   return j + k;
 }
 /* We should eliminate both 4*b and 4*a from the main body of the loop */
-/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c
index 91e0e89..57d5ce3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target int32 } */
-/* { dg-options "-O2 -fdump-tree-pre-details" } */
+/* { dg-options "-O2 -fdump-tree-pre2-details" } */
 
 int f;
 int g;
@@ -24,5 +24,5 @@ bar (int b, int x)
 /* We should see the partial redundant loads of f even though they
    are using different types (of the same size).  */
 
-/* { dg-final { scan-tree-dump-times "Replaced MEM" 2 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Replaced MEM" 2 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
index 2094de4..6bf8ccd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-pre2" } */
 
 typedef struct {
     unsigned int key;
@@ -43,5 +43,5 @@ int foo (S1 *root, int N)
   return 0;
 } 
 
-/* { dg-final { scan-tree-dump-times "key" 4 "pre" } } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "key" 4 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
index 274737a..4e5ff12 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int foo(void)
 {
 	int x, c, y;
@@ -11,5 +11,5 @@ int foo(void)
 }
 /* We should eliminate the x+1 computation from this routine, replacing
    it with a phi of 3, 4 */
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-5.c
index d0e985f..a373857 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int 
 foo (int i)
 {
@@ -12,6 +12,6 @@ foo (int i)
 }
 /* We should detect that a+b is the same along both edges, and replace it with
    5  */
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { scan-tree-dump-times "Insertions" 0 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { scan-tree-dump-times "Insertions" 0 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-6.c
index 2811f43..4437b14 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fdump-tree-pre2-stats" } */
 int foo(int x)
 {
 	int c, y;
@@ -10,5 +10,5 @@ int foo(int x)
 }
 /* We should eliminate one evaluation of x + 1 along the x = 2 path,
    causing one elimination.  */
-/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre"} } */
-/* { dg-final { cleanup-tree-dump "pre" } } */
+/* { dg-final { scan-tree-dump-times "Eliminated: 1" 1 "pre2" } } */
+/* { dg-final { cleanup-tree-dump "pre2" } } */
diff --git a/gcc/testsuite/gfortran.dg/pr43984.f90 b/gcc/testsuite/gfortran.dg/pr43984.f90
index 40c81b8..8efb7ca 100644
--- a/gcc/testsuite/gfortran.dg/pr43984.f90
+++ b/gcc/testsuite/gfortran.dg/pr43984.f90
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre" }
+! { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre2" }
 module test
 
    type shell1quartet_type
@@ -52,5 +52,5 @@ end
 
 ! There should be three loads from iyz.data, not four.
 
-! { dg-final { scan-tree-dump-times "= iyz.data" 3 "pre" } }
-! { dg-final { cleanup-tree-dump "pre" } }
+! { dg-final { scan-tree-dump-times "= iyz.data" 3 "pre2" } }
+! { dg-final { cleanup-tree-dump "pre2" } }
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index ea99198..d9dc512 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -4767,6 +4767,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_pre != 0; }
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_pre (m_ctxt); }
 
 }; // class pass_pre
 
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 6968df6..4cb1e37 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-sccvn.h"
 #include "tree-cfg.h"
 #include "domwalk.h"
+#include "omp-low.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
    Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
@@ -3446,7 +3447,8 @@ visit_use (tree use)
     {
       if (gimple_code (stmt) == GIMPLE_PHI)
 	changed = visit_phi (stmt);
-      else if (gimple_has_volatile_ops (stmt))
+      else if (gimple_has_volatile_ops (stmt)
+	       || gimple_stmt_omp_data_i_init_p (stmt))
 	changed = defs_to_varying (stmt);
       else if (is_gimple_assign (stmt))
 	{
diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index 303bd5e..36aa0a5 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -1668,7 +1668,8 @@ tail_merge_optimize (unsigned int todo)
   int max_iterations = PARAM_VALUE (PARAM_MAX_TAIL_MERGE_ITERATIONS);
 
   if (!flag_tree_tail_merge
-      || max_iterations == 0)
+      || max_iterations == 0
+      || (cfun->curr_properties & PROP_gimple_eomp) == 0)
     return 0;
 
   timevar_push (TV_TREE_TAIL_MERGE);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-24 11:53     ` Tom de Vries
@ 2014-11-24 11:55       ` Tom de Vries
  2014-11-24 12:42         ` Richard Biener
  2014-11-24 12:40       ` Richard Biener
  1 sibling, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-24 11:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jakub Jelinek, Thomas Schwinge, ebotcazou

[-- Attachment #1: Type: text/plain, Size: 5217 bytes --]

On 24-11-14 12:28, Tom de Vries wrote:
> On 17-11-14 11:13, Richard Biener wrote:
>> On Sat, 15 Nov 2014, Tom de Vries wrote:
>>
>>> >On 15-11-14 13:14, Tom de Vries wrote:
>>>> > >Hi,
>>>> > >
>>>> > >I'm submitting a patch series with initial support for the oacc kernels
>>>> > >directive.
>>>> > >
>>>> > >The patch series uses pass_parallelize_loops to implement parallelization of
>>>> > >loops in the oacc kernels region.
>>>> > >
>>>> > >The patch series consists of these 8 patches:
>>>> > >...
>>>> > >      1  Expand oacc kernels after pass_build_ealias
>>>> > >      2  Add pass_oacc_kernels
>>>> > >      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>>> > >      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>>> > >      5  Add pass_loop_im to pass_oacc_kernels
>>>> > >      6  Add pass_ccp to pass_oacc_kernels
>>>> > >      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>>> > >      8  Do simple omp lowering for no address taken var
>>>> > >...
>>> >
>>> >This patch lowers integer variables that do not have their address taken as
>>> >local variable.  We use a copy at region entry and exit to copy the value in
>>> >and out.
>>> >
>>> >In the context of reduction handling in a kernels region, this allows the
>>> >parloops reduction analysis to recognize the reduction, even after oacc
>>> >lowering has been done in pass_lower_omp.
>>> >
>>> >In more detail, without this patch, the omp_data_i load and stores are
>>> >generated in place (in this case, in the loop):
>>> >...
>>> >                 {
>>> >                   .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
>>> >                   {
>>> >                     unsigned intD.9 iD.2146;
>>> >
>>> >                     iD.2146 = 0;
>>> >                     goto <D.2207>;
>>> >                     <D.2208>:
>>> >                     D.2216 = .omp_data_iD.2201->cD.2203;
>>> >                     c.9D.2176 = *D.2216;
>>> >                     D.2177 = (long unsigned intD.10) iD.2146;
>>> >                     D.2178 = D.2177 * 4;
>>> >                     D.2179 = c.9D.2176 + D.2178;
>>> >                     D.2180 = *D.2179;
>>> >                     D.2217 = .omp_data_iD.2201->sumD.2205;
>>> >                     D.2218 = *D.2217;
>>> >                     D.2217 = .omp_data_iD.2201->sumD.2205;
>>> >                     D.2219 = D.2180 + D.2218;
>>> >                     *D.2217 = D.2219;
>>> >                     iD.2146 = iD.2146 + 1;
>>> >                     <D.2207>:
>>> >                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>>> >                     <D.2209>:
>>> >                   }
>>> >...
>>> >
>>> >With this patch, the omp_data_i load and stores for sum are generated at entry
>>> >and exit:
>>> >...
>>> >                 {
>>> >                   .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
>>> >                   D.2216 = .omp_data_iD.2201->sumD.2205;
>>> >                   sumD.2206 = *D.2216;
>>> >                   {
>>> >                     unsigned intD.9 iD.2146;
>>> >
>>> >                     iD.2146 = 0;
>>> >                     goto <D.2207>;
>>> >                     <D.2208>:
>>> >                     D.2217 = .omp_data_iD.2201->cD.2203;
>>> >                     c.9D.2176 = *D.2217;
>>> >                     D.2177 = (long unsigned intD.10) iD.2146;
>>> >                     D.2178 = D.2177 * 4;
>>> >                     D.2179 = c.9D.2176 + D.2178;
>>> >                     D.2180 = *D.2179;
>>> >                     sumD.2206 = D.2180 + sumD.2206;
>>> >                     iD.2146 = iD.2146 + 1;
>>> >                     <D.2207>:
>>> >                     if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>;
>>> >                     <D.2209>:
>>> >                   }
>>> >                   *D.2216 = sumD.2206;
>>> >                   #pragma omp return
>>> >                 }
>>> >...
>>> >
>>> >
>>> >So, without the patch the reduction operation looks like this:
>>> >...
>>> >     *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) + x
>>> >...
>>> >
>>> >And with this patch the reduction operation is simply:
>>> >...
>>> >     sumD.2206 = sumD.2206 + x:
>>> >...
>>> >
>>> >OK for trunk?
>> I presume the reason you are trying to do that here is that otherwise
>> it happens too late?  What you do is what loop store motion would
>> do.
>
> Richard,
>
> Thanks for the hint. I've built a reduction example:
> ...
> void __attribute__((noinline))
> f (unsigned int *__restrict__ a, unsigned int *__restrict__ sum, unsigned int n)
> {
>    unsigned int i;
>    for (i = 0; i < n; ++i)
>      *sum += a[i];
> }...
> and observed that store motion of the *sum store is done by pass_loop_im,
> provided the *sum load is taken out of the the loop by pass_pre first.
>
> So alternatively, we could use pass_pre and pass_loop_im to achieve the same
> effect.
>
> When trying out adding pass_pre as a part of the pass group pass_oacc_kernels, I
> found that also pass_copyprop was required to get parloops to recognize the
> reduction.
>

Attached patch adds pass_copyprop to pass group pass_oacc_kernels.

Bootstrapped and reg-tested in the same way as before.

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 0008-Add-pass_copy_prop-in-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 1441 bytes --]

2014-11-23  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
	* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
	conservatively.
---
 gcc/passes.def      | 1 +
 gcc/tree-ssa-copy.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/gcc/passes.def b/gcc/passes.def
index 3a7b096..8c663b0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -95,6 +95,7 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_ccp);
+	      NEXT_PASS (pass_copy_prop);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_expand_omp_ssa);
diff --git a/gcc/tree-ssa-copy.c b/gcc/tree-ssa-copy.c
index 7c22c5e..d6eb7a7 100644
--- a/gcc/tree-ssa-copy.c
+++ b/gcc/tree-ssa-copy.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-dom.h"
 #include "tree-ssa-loop-niter.h"
+#include "omp-low.h"
 
 
 /* This file implements the copy propagation pass and provides a
@@ -110,6 +111,9 @@ stmt_may_generate_copy (gimple stmt)
   if (gimple_has_volatile_ops (stmt))
     return false;
 
+  if (gimple_stmt_omp_data_i_init_p (stmt))
+    return false;
+
   /* Statements with loads and/or stores will never generate a useful copy.  */
   if (gimple_vuse (stmt))
     return false;
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-24 11:53     ` Tom de Vries
  2014-11-24 11:55       ` Tom de Vries
@ 2014-11-24 12:40       ` Richard Biener
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Biener @ 2014-11-24 12:40 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek, Thomas Schwinge, ebotcazou

On Mon, 24 Nov 2014, Tom de Vries wrote:

> On 17-11-14 11:13, Richard Biener wrote:
> > On Sat, 15 Nov 2014, Tom de Vries wrote:
> > 
> > > >On 15-11-14 13:14, Tom de Vries wrote:
> > > > > >Hi,
> > > > > >
> > > > > >I'm submitting a patch series with initial support for the oacc
> > > > kernels
> > > > > >directive.
> > > > > >
> > > > > >The patch series uses pass_parallelize_loops to implement
> > > > parallelization of
> > > > > >loops in the oacc kernels region.
> > > > > >
> > > > > >The patch series consists of these 8 patches:
> > > > > >...
> > > > > >      1  Expand oacc kernels after pass_build_ealias
> > > > > >      2  Add pass_oacc_kernels
> > > > > >      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > > > > >      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > > > > >      5  Add pass_loop_im to pass_oacc_kernels
> > > > > >      6  Add pass_ccp to pass_oacc_kernels
> > > > > >      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > > > > >      8  Do simple omp lowering for no address taken var
> > > > > >...
> > > >
> > > >This patch lowers integer variables that do not have their address taken
> > > as
> > > >local variable.  We use a copy at region entry and exit to copy the value
> > > in
> > > >and out.
> > > >
> > > >In the context of reduction handling in a kernels region, this allows the
> > > >parloops reduction analysis to recognize the reduction, even after oacc
> > > >lowering has been done in pass_lower_omp.
> > > >
> > > >In more detail, without this patch, the omp_data_i load and stores are
> > > >generated in place (in this case, in the loop):
> > > >...
> > > >                 {
> > > >                   .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
> > > >                   {
> > > >                     unsigned intD.9 iD.2146;
> > > >
> > > >                     iD.2146 = 0;
> > > >                     goto <D.2207>;
> > > >                     <D.2208>:
> > > >                     D.2216 = .omp_data_iD.2201->cD.2203;
> > > >                     c.9D.2176 = *D.2216;
> > > >                     D.2177 = (long unsigned intD.10) iD.2146;
> > > >                     D.2178 = D.2177 * 4;
> > > >                     D.2179 = c.9D.2176 + D.2178;
> > > >                     D.2180 = *D.2179;
> > > >                     D.2217 = .omp_data_iD.2201->sumD.2205;
> > > >                     D.2218 = *D.2217;
> > > >                     D.2217 = .omp_data_iD.2201->sumD.2205;
> > > >                     D.2219 = D.2180 + D.2218;
> > > >                     *D.2217 = D.2219;
> > > >                     iD.2146 = iD.2146 + 1;
> > > >                     <D.2207>:
> > > >                     if (iD.2146 <= 524287) goto <D.2208>; else goto
> > > <D.2209>;
> > > >                     <D.2209>:
> > > >                   }
> > > >...
> > > >
> > > >With this patch, the omp_data_i load and stores for sum are generated at
> > > entry
> > > >and exit:
> > > >...
> > > >                 {
> > > >                   .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
> > > >                   D.2216 = .omp_data_iD.2201->sumD.2205;
> > > >                   sumD.2206 = *D.2216;
> > > >                   {
> > > >                     unsigned intD.9 iD.2146;
> > > >
> > > >                     iD.2146 = 0;
> > > >                     goto <D.2207>;
> > > >                     <D.2208>:
> > > >                     D.2217 = .omp_data_iD.2201->cD.2203;
> > > >                     c.9D.2176 = *D.2217;
> > > >                     D.2177 = (long unsigned intD.10) iD.2146;
> > > >                     D.2178 = D.2177 * 4;
> > > >                     D.2179 = c.9D.2176 + D.2178;
> > > >                     D.2180 = *D.2179;
> > > >                     sumD.2206 = D.2180 + sumD.2206;
> > > >                     iD.2146 = iD.2146 + 1;
> > > >                     <D.2207>:
> > > >                     if (iD.2146 <= 524287) goto <D.2208>; else goto
> > > <D.2209>;
> > > >                     <D.2209>:
> > > >                   }
> > > >                   *D.2216 = sumD.2206;
> > > >                   #pragma omp return
> > > >                 }
> > > >...
> > > >
> > > >
> > > >So, without the patch the reduction operation looks like this:
> > > >...
> > > >     *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) +
> > > x
> > > >...
> > > >
> > > >And with this patch the reduction operation is simply:
> > > >...
> > > >     sumD.2206 = sumD.2206 + x:
> > > >...
> > > >
> > > >OK for trunk?
> > I presume the reason you are trying to do that here is that otherwise
> > it happens too late?  What you do is what loop store motion would
> > do.
> 
> Richard,
> 
> Thanks for the hint. I've built a reduction example:
> ...
> void __attribute__((noinline))
> f (unsigned int *__restrict__ a, unsigned int *__restrict__ sum, unsigned int
> n)
> {
>   unsigned int i;
>   for (i = 0; i < n; ++i)
>     *sum += a[i];
> }...
> and observed that store motion of the *sum store is done by pass_loop_im,
> provided the *sum load is taken out of the the loop by pass_pre first.

That doesn't make much sense.  Why is LIM not moving the *sum load?
Ah - if n == 0 the body may not be executed and thus a hoisted load
may trap?  I suppose you rather need a loop header copying pass.

> So alternatively, we could use pass_pre and pass_loop_im to achieve the same
> effect.
> 
> When trying out adding pass_pre as a part of the pass group pass_oacc_kernels,
> I found that also pass_copyprop was required to get parloops to recognize the
> reduction.
> 
> Attached patch adds the pre pass to pass group pass_oacc_kernels.
> 
> Bootstrapped and reg-tested in the same way as before.
> 
> OK for trunk?

No, I don't think you want this.

Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-24 11:55       ` Tom de Vries
@ 2014-11-24 12:42         ` Richard Biener
  2014-11-24 18:49           ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2014-11-24 12:42 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek, Thomas Schwinge, ebotcazou

On Mon, 24 Nov 2014, Tom de Vries wrote:

> On 24-11-14 12:28, Tom de Vries wrote:
> > On 17-11-14 11:13, Richard Biener wrote:
> > > On Sat, 15 Nov 2014, Tom de Vries wrote:
> > > 
> > > > >On 15-11-14 13:14, Tom de Vries wrote:
> > > > > > >Hi,
> > > > > > >
> > > > > > >I'm submitting a patch series with initial support for the oacc
> > > > > kernels
> > > > > > >directive.
> > > > > > >
> > > > > > >The patch series uses pass_parallelize_loops to implement
> > > > > parallelization of
> > > > > > >loops in the oacc kernels region.
> > > > > > >
> > > > > > >The patch series consists of these 8 patches:
> > > > > > >...
> > > > > > >      1  Expand oacc kernels after pass_build_ealias
> > > > > > >      2  Add pass_oacc_kernels
> > > > > > >      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > > > > > >      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > > > > > >      5  Add pass_loop_im to pass_oacc_kernels
> > > > > > >      6  Add pass_ccp to pass_oacc_kernels
> > > > > > >      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > > > > > >      8  Do simple omp lowering for no address taken var
> > > > > > >...
> > > > >
> > > > >This patch lowers integer variables that do not have their address
> > > > taken as
> > > > >local variable.  We use a copy at region entry and exit to copy the
> > > > value in
> > > > >and out.
> > > > >
> > > > >In the context of reduction handling in a kernels region, this allows
> > > > the
> > > > >parloops reduction analysis to recognize the reduction, even after oacc
> > > > >lowering has been done in pass_lower_omp.
> > > > >
> > > > >In more detail, without this patch, the omp_data_i load and stores are
> > > > >generated in place (in this case, in the loop):
> > > > >...
> > > > >                 {
> > > > >                   .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
> > > > >                   {
> > > > >                     unsigned intD.9 iD.2146;
> > > > >
> > > > >                     iD.2146 = 0;
> > > > >                     goto <D.2207>;
> > > > >                     <D.2208>:
> > > > >                     D.2216 = .omp_data_iD.2201->cD.2203;
> > > > >                     c.9D.2176 = *D.2216;
> > > > >                     D.2177 = (long unsigned intD.10) iD.2146;
> > > > >                     D.2178 = D.2177 * 4;
> > > > >                     D.2179 = c.9D.2176 + D.2178;
> > > > >                     D.2180 = *D.2179;
> > > > >                     D.2217 = .omp_data_iD.2201->sumD.2205;
> > > > >                     D.2218 = *D.2217;
> > > > >                     D.2217 = .omp_data_iD.2201->sumD.2205;
> > > > >                     D.2219 = D.2180 + D.2218;
> > > > >                     *D.2217 = D.2219;
> > > > >                     iD.2146 = iD.2146 + 1;
> > > > >                     <D.2207>:
> > > > >                     if (iD.2146 <= 524287) goto <D.2208>; else goto
> > > > <D.2209>;
> > > > >                     <D.2209>:
> > > > >                   }
> > > > >...
> > > > >
> > > > >With this patch, the omp_data_i load and stores for sum are generated
> > > > at entry
> > > > >and exit:
> > > > >...
> > > > >                 {
> > > > >                   .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
> > > > >                   D.2216 = .omp_data_iD.2201->sumD.2205;
> > > > >                   sumD.2206 = *D.2216;
> > > > >                   {
> > > > >                     unsigned intD.9 iD.2146;
> > > > >
> > > > >                     iD.2146 = 0;
> > > > >                     goto <D.2207>;
> > > > >                     <D.2208>:
> > > > >                     D.2217 = .omp_data_iD.2201->cD.2203;
> > > > >                     c.9D.2176 = *D.2217;
> > > > >                     D.2177 = (long unsigned intD.10) iD.2146;
> > > > >                     D.2178 = D.2177 * 4;
> > > > >                     D.2179 = c.9D.2176 + D.2178;
> > > > >                     D.2180 = *D.2179;
> > > > >                     sumD.2206 = D.2180 + sumD.2206;
> > > > >                     iD.2146 = iD.2146 + 1;
> > > > >                     <D.2207>:
> > > > >                     if (iD.2146 <= 524287) goto <D.2208>; else goto
> > > > <D.2209>;
> > > > >                     <D.2209>:
> > > > >                   }
> > > > >                   *D.2216 = sumD.2206;
> > > > >                   #pragma omp return
> > > > >                 }
> > > > >...
> > > > >
> > > > >
> > > > >So, without the patch the reduction operation looks like this:
> > > > >...
> > > > >     *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205)
> > > > + x
> > > > >...
> > > > >
> > > > >And with this patch the reduction operation is simply:
> > > > >...
> > > > >     sumD.2206 = sumD.2206 + x:
> > > > >...
> > > > >
> > > > >OK for trunk?
> > > I presume the reason you are trying to do that here is that otherwise
> > > it happens too late?  What you do is what loop store motion would
> > > do.
> > 
> > Richard,
> > 
> > Thanks for the hint. I've built a reduction example:
> > ...
> > void __attribute__((noinline))
> > f (unsigned int *__restrict__ a, unsigned int *__restrict__ sum, unsigned
> > int n)
> > {
> >    unsigned int i;
> >    for (i = 0; i < n; ++i)
> >      *sum += a[i];
> > }...
> > and observed that store motion of the *sum store is done by pass_loop_im,
> > provided the *sum load is taken out of the the loop by pass_pre first.
> > 
> > So alternatively, we could use pass_pre and pass_loop_im to achieve the same
> > effect.
> > 
> > When trying out adding pass_pre as a part of the pass group
> > pass_oacc_kernels, I
> > found that also pass_copyprop was required to get parloops to recognize the
> > reduction.
> > 
> 
> Attached patch adds pass_copyprop to pass group pass_oacc_kernels.

Hum, you are gobbling up very many passes here.  In this case copyprop
will also perform trivial constant propagation so maybe it's enough
to replace ccp by copyprop.  Or go the full way and add a FRE pass.

Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 8/8] Do simple omp lowering for no address taken var
  2014-11-24 12:42         ` Richard Biener
@ 2014-11-24 18:49           ` Tom de Vries
  0 siblings, 0 replies; 71+ messages in thread
From: Tom de Vries @ 2014-11-24 18:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jakub Jelinek, Thomas Schwinge, ebotcazou

On 24-11-14 13:12, Richard Biener wrote:
> On Mon, 24 Nov 2014, Tom de Vries wrote:
>
>> On 24-11-14 12:28, Tom de Vries wrote:
>>> On 17-11-14 11:13, Richard Biener wrote:
>>>> On Sat, 15 Nov 2014, Tom de Vries wrote:
>>>>
>>>>>> On 15-11-14 13:14, Tom de Vries wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm submitting a patch series with initial support for the oacc
>>>>>> kernels
>>>>>>>> directive.
>>>>>>>>
>>>>>>>> The patch series uses pass_parallelize_loops to implement
>>>>>> parallelization of
>>>>>>>> loops in the oacc kernels region.
>>>>>>>>
>>>>>>>> The patch series consists of these 8 patches:
>>>>>>>> ...
>>>>>>>>       1  Expand oacc kernels after pass_build_ealias
>>>>>>>>       2  Add pass_oacc_kernels
>>>>>>>>       3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>>>>>>>       4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>>>>>>>       5  Add pass_loop_im to pass_oacc_kernels
>>>>>>>>       6  Add pass_ccp to pass_oacc_kernels
>>>>>>>>       7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>>>>>>>       8  Do simple omp lowering for no address taken var
>>>>>>>> ...
>>>>>>
>>>>>> This patch lowers integer variables that do not have their address
>>>>> taken as
>>>>>> local variable.  We use a copy at region entry and exit to copy the
>>>>> value in
>>>>>> and out.
>>>>>>
>>>>>> In the context of reduction handling in a kernels region, this allows
>>>>> the
>>>>>> parloops reduction analysis to recognize the reduction, even after oacc
>>>>>> lowering has been done in pass_lower_omp.
>>>>>>
>>>>>> In more detail, without this patch, the omp_data_i load and stores are
>>>>>> generated in place (in this case, in the loop):
>>>>>> ...
>>>>>>                  {
>>>>>>                    .omp_data_iD.2201 = &.omp_data_arr.15D.2220;
>>>>>>                    {
>>>>>>                      unsigned intD.9 iD.2146;
>>>>>>
>>>>>>                      iD.2146 = 0;
>>>>>>                      goto <D.2207>;
>>>>>>                      <D.2208>:
>>>>>>                      D.2216 = .omp_data_iD.2201->cD.2203;
>>>>>>                      c.9D.2176 = *D.2216;
>>>>>>                      D.2177 = (long unsigned intD.10) iD.2146;
>>>>>>                      D.2178 = D.2177 * 4;
>>>>>>                      D.2179 = c.9D.2176 + D.2178;
>>>>>>                      D.2180 = *D.2179;
>>>>>>                      D.2217 = .omp_data_iD.2201->sumD.2205;
>>>>>>                      D.2218 = *D.2217;
>>>>>>                      D.2217 = .omp_data_iD.2201->sumD.2205;
>>>>>>                      D.2219 = D.2180 + D.2218;
>>>>>>                      *D.2217 = D.2219;
>>>>>>                      iD.2146 = iD.2146 + 1;
>>>>>>                      <D.2207>:
>>>>>>                      if (iD.2146 <= 524287) goto <D.2208>; else goto
>>>>> <D.2209>;
>>>>>>                      <D.2209>:
>>>>>>                    }
>>>>>> ...
>>>>>>
>>>>>> With this patch, the omp_data_i load and stores for sum are generated
>>>>> at entry
>>>>>> and exit:
>>>>>> ...
>>>>>>                  {
>>>>>>                    .omp_data_iD.2201 = &.omp_data_arr.15D.2218;
>>>>>>                    D.2216 = .omp_data_iD.2201->sumD.2205;
>>>>>>                    sumD.2206 = *D.2216;
>>>>>>                    {
>>>>>>                      unsigned intD.9 iD.2146;
>>>>>>
>>>>>>                      iD.2146 = 0;
>>>>>>                      goto <D.2207>;
>>>>>>                      <D.2208>:
>>>>>>                      D.2217 = .omp_data_iD.2201->cD.2203;
>>>>>>                      c.9D.2176 = *D.2217;
>>>>>>                      D.2177 = (long unsigned intD.10) iD.2146;
>>>>>>                      D.2178 = D.2177 * 4;
>>>>>>                      D.2179 = c.9D.2176 + D.2178;
>>>>>>                      D.2180 = *D.2179;
>>>>>>                      sumD.2206 = D.2180 + sumD.2206;
>>>>>>                      iD.2146 = iD.2146 + 1;
>>>>>>                      <D.2207>:
>>>>>>                      if (iD.2146 <= 524287) goto <D.2208>; else goto
>>>>> <D.2209>;
>>>>>>                      <D.2209>:
>>>>>>                    }
>>>>>>                    *D.2216 = sumD.2206;
>>>>>>                    #pragma omp return
>>>>>>                  }
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> So, without the patch the reduction operation looks like this:
>>>>>> ...
>>>>>>      *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205)
>>>>> + x
>>>>>> ...
>>>>>>
>>>>>> And with this patch the reduction operation is simply:
>>>>>> ...
>>>>>>      sumD.2206 = sumD.2206 + x:
>>>>>> ...
>>>>>>
>>>>>> OK for trunk?
>>>> I presume the reason you are trying to do that here is that otherwise
>>>> it happens too late?  What you do is what loop store motion would
>>>> do.
>>>
>>> Richard,
>>>
>>> Thanks for the hint. I've built a reduction example:
>>> ...
>>> void __attribute__((noinline))
>>> f (unsigned int *__restrict__ a, unsigned int *__restrict__ sum, unsigned
>>> int n)
>>> {
>>>     unsigned int i;
>>>     for (i = 0; i < n; ++i)
>>>       *sum += a[i];
>>> }...
>>> and observed that store motion of the *sum store is done by pass_loop_im,
>>> provided the *sum load is taken out of the the loop by pass_pre first.
>>>
>>> So alternatively, we could use pass_pre and pass_loop_im to achieve the same
>>> effect.
>>>
>>> When trying out adding pass_pre as a part of the pass group
>>> pass_oacc_kernels, I
>>> found that also pass_copyprop was required to get parloops to recognize the
>>> reduction.
>>>
>>
>> Attached patch adds pass_copyprop to pass group pass_oacc_kernels.
>
> Hum, you are gobbling up very many passes here.  In this case copyprop
> will also perform trivial constant propagation so maybe it's enough
> to replace ccp by copyprop.  Or go the full way and add a FRE pass.
>

Yep, replacing ccp by copyprop seems to work well enough.

I'll repost once bootstrap and reg-test are done.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias
  2014-11-24 11:29   ` Tom de Vries
@ 2014-11-25 11:30     ` Tom de Vries
  2015-04-21 19:40       ` Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias) Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-25 11:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 3838 bytes --]

On 24-11-14 11:56, Tom de Vries wrote:
> On 15-11-14 18:19, Tom de Vries wrote:
>> On 15-11-14 13:14, Tom de Vries wrote:
>>> Hi,
>>>
>>> I'm submitting a patch series with initial support for the oacc kernels
>>> directive.
>>>
>>> The patch series uses pass_parallelize_loops to implement parallelization of
>>> loops in the oacc kernels region.
>>>
>>> The patch series consists of these 8 patches:
>>> ...
>>>      1  Expand oacc kernels after pass_build_ealias
>>>      2  Add pass_oacc_kernels
>>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>>      5  Add pass_loop_im to pass_oacc_kernels
>>>      6  Add pass_ccp to pass_oacc_kernels
>>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>>      8  Do simple omp lowering for no address taken var
>>> ...
>>
>> This patch moves omp expansion of the oacc kernels directive to after
>> pass_build_ealias.
>>
>> The rationale is that in order to use pass_parallelize_loops for analysis and
>> transformation of an oacc kernels region, we postpone omp expansion of that
>> region until the earliest point in the pass list where enough information is
>> availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.
>>
>> The patch postpones expansion in expand_omp, and ensures expansion by adding
>> pass_expand_omp_ssa:
>> - after pass_build_ealias, and
>> - after pass_all_early_optimizations for the case we're not optimizing.
>>
>> In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
>> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
>> lowered omp code, to handle it conservatively.
>>
>> The patch contains changes in expand_omp_target to deal with ssa-code, similar
>> to what is already present in expand_omp_taskreg.
>>
>> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
>> static for oacc kernels. It does this to get some references to .omp_data_sizes
>> and .omp_data_kinds in the ssa code.  Without these references, the definitions
>> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
>> enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
>> kludge for this purpose ].
>>
>> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
>> original function of which the definition has been removed (as in moved to the
>> split off function). TODO_remove_unused_locals takes care of some of them, but
>> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
>> dangling SSA_NAMEs and releases them.
>>
>
> Reposting with small update: I've replaced the use of the rather generic
> gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.
>
> Bootstrapped and reg-tested in the same way as before.
>

I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.

This allows fre to unify references to the same omp variable before entering 
pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.

F.i. this reduction fragment:
...
   # VUSE <.MEM_8>
   # PT = { D.2282 }
   _67 = .omp_data_i_59->sumD.2270;
   # VUSE <.MEM_8>
   _68 = *_67;

   _70 = _66 + _68;

   # VUSE <.MEM_8>
   # PT = { D.2282 }
   _69 = .omp_data_i_59->sumD.2270;
   # .MEM_71 = VDEF <.MEM_8>
   *_69 = _70;
...

is transformed by fre into:
...
   # VUSE <.MEM_8>
   # PT = { D.2282 }
   _67 = .omp_data_i_59->sumD.2270;
   # VUSE <.MEM_8>
   _68 = *_67;

   _70 = _66 + _68;

   # .MEM_71 = VDEF <.MEM_8>
   *_67 = _70;
...

In order for pass_fre to respect the kernels region boundaries, I've added a 
change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively.

Bootstrapped and reg-tested as before.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0001-Expand-oacc-kernels-after-pass_fre.patch --]
[-- Type: text/x-patch, Size: 17139 bytes --]

[PATCH 1/7] Expand oacc kernels after pass_fre

2014-11-25  Tom de Vries  <tom@codesourcery.com>

	* function.h (struct function): Add contains_oacc_kernels field.
	* gimplify.c (gimplify_omp_workshare): Set contains_oacc_kernels.
	* omp-low.c: Include gimple-pretty-print.h.
	(release_first_vuse_in_edge_dest): New function.
	(expand_omp_target): Handle ssa-code.
	(expand_omp): Don't expand GIMPLE_OACC_KERNELS when not in ssa.
	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
	properties_provided field.
	(pass_expand_omp::execute): Set PROP_gimple_eomp in
	cfun->curr_properties only if cfun does not contain oacc kernels.
	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
	todo_flags_finish field.
	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
	execute_expand_omp.
	(lower_omp_target): Add static_arrays variable, init to 1.  Don't use
	static arrays for kernels directive.  Use static_arrays variable.
	Handle case that .omp_data_kinds is not static.
	(gimple_stmt_ssa_operand_references_var_p)
	(gimple_stmt_omp_data_i_init_p): New function.
	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
	pass_expand_omp_ssa after pass_all_early_optimizations.
	* tree-ssa-ccp.c: Include omp-low.h.
	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
	conservatively.
	* tree-ssa-forwprop.c: Include omp-low.h.
	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
	* tree-ssa-sccvn.c: Include omp-low.h.
	(visit_use):  Handle .omp_data_i init conservatively.
---
 gcc/function.h          |   3 +
 gcc/gimplify.c          |   1 +
 gcc/omp-low.c           | 196 +++++++++++++++++++++++++++++++++++++++++++++---
 gcc/omp-low.h           |   1 +
 gcc/passes.def          |   2 +
 gcc/tree-ssa-ccp.c      |   6 ++
 gcc/tree-ssa-forwprop.c |   4 +-
 gcc/tree-ssa-sccvn.c    |   4 +-
 8 files changed, 203 insertions(+), 14 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index 3a6305c..bb48775 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -667,6 +667,9 @@ struct GTY(()) function {
 
   /* Set when the tail call has been identified.  */
   unsigned int tail_call_marked : 1;
+
+  /* Set when the function contains oacc kernels directives.  */
+  unsigned int contains_oacc_kernels : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ad48d51..c40f20f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7316,6 +7316,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
       break;
     case OACC_KERNELS:
       stmt = gimple_build_oacc_kernels (body, OACC_KERNELS_CLAUSES (expr));
+      cfun->contains_oacc_kernels = 1;
       break;
     case OACC_PARALLEL:
       stmt = gimple_build_oacc_parallel (body, OACC_PARALLEL_CLAUSES (expr));
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c503cc1..3ac546c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -88,6 +88,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "cilk.h"
 #include "lto-section-names.h"
+#include "gimple-pretty-print.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -5338,6 +5339,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
     }
 }
 
+static void
+release_first_vuse_in_edge_dest (edge e)
+{
+  gimple_stmt_iterator i;
+  basic_block bb = e->dest;
+
+  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
+    {
+      gimple phi = gsi_stmt (i);
+      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+
+      if (!virtual_operand_p (arg))
+	continue;
+
+      mark_virtual_operand_for_renaming (arg);
+      return;
+    }
+
+  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
+    {
+      gimple stmt = gsi_stmt (i);
+      if (gimple_vuse (stmt) == NULL_TREE)
+	continue;
+
+      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
+      return;
+    }
+}
+
 /* Expand the OpenMP parallel or task directive starting at REGION.  */
 
 static void
@@ -8832,7 +8862,6 @@ expand_omp_target (struct omp_region *region)
   /* Supported by expand_omp_taskreg, but not here.  */
   if (child_cfun != NULL)
     gcc_assert (!child_cfun->cfg);
-  gcc_assert (!gimple_in_ssa_p (cfun));
 
   entry_bb = region->entry;
   exit_bb = region->exit;
@@ -8858,7 +8887,7 @@ expand_omp_target (struct omp_region *region)
 	{
 	  basic_block entry_succ_bb = single_succ (entry_bb);
 	  gimple_stmt_iterator gsi;
-	  tree arg;
+	  tree arg, narg;
 	  gimple tgtcopy_stmt = NULL;
 	  tree sender = TREE_VEC_ELT (gimple_omp_data_arg (entry_stmt), 0);
 
@@ -8888,8 +8917,27 @@ expand_omp_target (struct omp_region *region)
 	  gcc_assert (tgtcopy_stmt != NULL);
 	  arg = DECL_ARGUMENTS (child_fn);
 
-	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
-	  gsi_remove (&gsi, true);
+	  if (!gimple_in_ssa_p (cfun))
+	    {
+	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
+	      gsi_remove (&gsi, true);
+	    }
+	  else
+	    {
+	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
+			  == arg);
+
+	      /* If we are in ssa form, we must load the value from the default
+		 definition of the argument.  That should not be defined now,
+		 since the argument is not used uninitialized.  */
+	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
+	      narg = make_ssa_name (arg, gimple_build_nop ());
+	      set_ssa_default_def (cfun, arg, narg);
+	      /* ?? Is setting the subcode really necessary ??  */
+	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
+	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
+	      update_stmt (tgtcopy_stmt);
+	    }
 	}
 
       /* Declare local variables needed in CHILD_CFUN.  */
@@ -8932,11 +8980,23 @@ expand_omp_target (struct omp_region *region)
 	  stmt = gimple_build_return (NULL);
 	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
 	  gsi_remove (&gsi, true);
+
+	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
+	     which is about to be split off.  Mark the vdef for renaming.  */
+	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
 	}
 
       /* Move the offloading region into CHILD_CFUN.  */
 
-      block = gimple_block (entry_stmt);
+      if (gimple_in_ssa_p (cfun))
+	{
+	  init_tree_ssa (child_cfun);
+	  init_ssa_operands (child_cfun);
+	  child_cfun->gimple_df->in_ssa_p = true;
+	  block = NULL_TREE;
+	}
+      else
+	block = gimple_block (entry_stmt);
 
       new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
       if (exit_bb)
@@ -8986,6 +9046,8 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+      if (gimple_in_ssa_p (cfun))
+	update_ssa (TODO_update_ssa);
       pop_cfun ();
     }
 
@@ -9262,6 +9324,8 @@ expand_omp_target (struct omp_region *region)
       gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
       gsi_remove (&gsi, true);
     }
+  if (gimple_in_ssa_p (cfun))
+    update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 
@@ -9332,6 +9396,15 @@ expand_omp (struct omp_region *region)
 	  break;
 
 	case GIMPLE_OACC_KERNELS:
+	  if (!gimple_in_ssa_p (cfun))
+	    /* We're in pass_expand_omp.  Postpone expanding till
+	       pass_expand_omp_ssa.  */
+	    break;
+
+	  /* We're in pass_expand_omp_ssa.  Expand now.  */
+
+	  /* FALLTHRU.  */
+
 	case GIMPLE_OACC_PARALLEL:
 	case GIMPLE_OMP_TARGET:
 	  expand_omp_target (region);
@@ -9504,7 +9577,7 @@ const pass_data pass_data_expand_omp =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
-  PROP_gimple_eomp, /* properties_provided */
+  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
   0, /* todo_flags_finish */
@@ -9518,7 +9591,7 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual unsigned int execute (function *)
+  virtual unsigned int execute (function *fun)
     {
       bool gate = ((flag_openacc != 0 || flag_openmp != 0
 		    || flag_openmp_simd != 0 || flag_cilkplus != 0)
@@ -9529,7 +9602,12 @@ public:
       if (!gate)
 	return 0;
 
-      return execute_expand_omp ();
+      unsigned int res = execute_expand_omp ();
+
+      if (!fun->contains_oacc_kernels)
+	fun->curr_properties |= PROP_gimple_eomp;
+
+      return res;
     }
 
 }; // class pass_expand_omp
@@ -9554,7 +9632,8 @@ const pass_data pass_data_expand_omp_ssa =
   PROP_gimple_eomp, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_rebuild_alias
+  | TODO_remove_unused_locals, /* todo_flags_finish */
 };
 
 class pass_expand_omp_ssa : public gimple_opt_pass
@@ -9569,7 +9648,47 @@ public:
     {
       return !(fun->curr_properties & PROP_gimple_eomp);
     }
-  virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  virtual unsigned int execute (function *)
+    {
+      unsigned res = execute_expand_omp ();
+
+      /* After running pass_expand_omp_ssa to expand the oacc kernels
+	 directive, we are left in the original function with anonymous
+	 SSA_NAMEs, with a defining statement that has been deleted.  This
+	 pass finds those SSA_NAMEs and releases them.  */
+      unsigned int i;
+      for (i = 1; i < num_ssa_names; ++i)
+	{
+	  tree name = ssa_name (i);
+	  if (name == NULL_TREE)
+	    continue;
+
+	  gimple stmt = SSA_NAME_DEF_STMT (name);
+	  bool found = false;
+
+	  ssa_op_iter op_iter;
+	  def_operand_p def_p;
+	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	    {
+	      tree def = DEF_FROM_PTR (def_p);
+	      if (def == name)
+		{
+		  found = true;
+		  break;
+		}
+	    }
+
+	  if (!found)
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Released dangling ssa name %u\n", i);
+	      release_ssa_name (name);
+	    }
+	}
+
+      return res;
+    }
+  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
@@ -11195,6 +11314,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   unsigned int map_cnt = 0;
   tree (*gimple_omp_clauses) (const_gimple);
   void (*gimple_omp_set_data_arg) (gimple, tree);
+  unsigned int static_arrays = 1;
 
   offloaded = is_gimple_omp_offloaded (stmt);
   data_region = false;
@@ -11203,6 +11323,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GIMPLE_OACC_KERNELS:
       gimple_omp_clauses = gimple_oacc_kernels_clauses;
       gimple_omp_set_data_arg = gimple_oacc_kernels_set_data_arg;
+      static_arrays = 0;
       break;
     case GIMPLE_OACC_PARALLEL:
       gimple_omp_clauses = gimple_oacc_parallel_clauses;
@@ -11369,7 +11490,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			  ".omp_data_sizes");
       DECL_NAMELESS (TREE_VEC_ELT (t, 1)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 1)) = 1;
-      TREE_STATIC (TREE_VEC_ELT (t, 1)) = 1;
+      TREE_STATIC (TREE_VEC_ELT (t, 1)) = static_arrays;
       tree tkind_type;
       int talign_shift;
       if (is_gimple_omp_oacc_specifically (stmt))
@@ -11387,7 +11508,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			  ".omp_data_kinds");
       DECL_NAMELESS (TREE_VEC_ELT (t, 2)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 2)) = 1;
-      TREE_STATIC (TREE_VEC_ELT (t, 2)) = 1;
+      TREE_STATIC (TREE_VEC_ELT (t, 2)) = static_arrays;
       gimple_omp_set_data_arg (stmt, t);
 
       vec<constructor_elt, va_gc> *vsize;
@@ -11560,6 +11681,22 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						    clobber));
 	}
 
+      if (!TREE_STATIC (TREE_VEC_ELT (t, 2)))
+	{
+	  gimple_seq initlist = NULL;
+	  force_gimple_operand (build1 (DECL_EXPR, void_type_node,
+					TREE_VEC_ELT (t, 2)),
+				&initlist, true, NULL_TREE);
+	  gimple_seq_add_seq (&ilist, initlist);
+
+	  tree clobber = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 2)),
+					    NULL);
+	  TREE_THIS_VOLATILE (clobber) = 1;
+	  gimple_seq_add_stmt (&olist,
+			       gimple_build_assign (TREE_VEC_ELT (t, 2),
+						    clobber));
+	}
+
       tree clobber = build_constructor (ctx->record_type, NULL);
       TREE_THIS_VOLATILE (clobber) = 1;
       gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl,
@@ -13740,4 +13877,39 @@ omp_finish_file (void)
     }
 }
 
+static bool
+gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
+					  unsigned int nr_varnames,
+					  unsigned int flags)
+{
+  tree use;
+  ssa_op_iter iter;
+  const char *s;
+
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
+    {
+      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
+	continue;
+      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
+
+      unsigned int i;
+      for (i = 0; i < nr_varnames; ++i)
+	if (strcmp (varnames[i], s) == 0)
+	  return true;
+    }
+
+  return false;
+}
+
+/* Return true if STMT is .omp_data_i init.  */
+
+bool
+gimple_stmt_omp_data_i_init_p (gimple stmt)
+{
+  const char *varnames[] = { ".omp_data_i" };
+  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
+  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
+						   SSA_OP_DEF);
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index ac587d0..32076e4 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern bool gimple_stmt_omp_data_i_init_p (gimple);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git a/gcc/passes.def b/gcc/passes.def
index ebd2b95..bf1cd34 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
 	  NEXT_PASS (pass_fre);
+	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
 	  NEXT_PASS (pass_early_ipa_sra);
@@ -99,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
 	      late.  */
 	  NEXT_PASS (pass_split_functions);
       POP_INSERT_PASSES ()
+      NEXT_PASS (pass_expand_omp_ssa);
       NEXT_PASS (pass_release_ssa_names);
       NEXT_PASS (pass_rebuild_cgraph_edges);
       NEXT_PASS (pass_inline_parameters);
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 52d8503..23185e6 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -165,6 +165,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "wide-int-print.h"
 #include "builtins.h"
 #include "tree-chkp.h"
+#include "omp-low.h"
 
 
 /* Possible lattice values.  */
@@ -789,6 +790,9 @@ surely_varying_stmt_p (gimple stmt)
       && gimple_code (stmt) != GIMPLE_CALL)
     return true;
 
+  if (gimple_stmt_omp_data_i_init_p (stmt))
+    return true;
+
   return false;
 }
 
@@ -2297,6 +2301,8 @@ ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
   switch (gimple_code (stmt))
     {
       case GIMPLE_ASSIGN:
+	if (gimple_stmt_omp_data_i_init_p (stmt))
+	  break;
         /* If the statement is an assignment that produces a single
            output value, evaluate its RHS to see if the lattice value of
            its output has changed.  */
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index feb8253..860c53e 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -68,6 +68,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-into-ssa.h"
 #include "cfganal.h"
+#include "omp-low.h"
 
 /* This pass propagates the RHS of assignment statements into use
    sites of the LHS of the assignment.  It's basically a specialized
@@ -2244,7 +2245,8 @@ pass_forwprop::execute (function *fun)
 	  tree lhs, rhs;
 	  enum tree_code code;
 
-	  if (!is_gimple_assign (stmt))
+	  if (!is_gimple_assign (stmt)
+	      || gimple_stmt_omp_data_i_init_p (stmt))
 	    {
 	      gsi_next (&gsi);
 	      continue;
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 6968df6..4cb1e37 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-sccvn.h"
 #include "tree-cfg.h"
 #include "domwalk.h"
+#include "omp-low.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
    Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
@@ -3446,7 +3447,8 @@ visit_use (tree use)
     {
       if (gimple_code (stmt) == GIMPLE_PHI)
 	changed = visit_phi (stmt);
-      else if (gimple_has_volatile_ops (stmt))
+      else if (gimple_has_volatile_ops (stmt)
+	       || gimple_stmt_omp_data_i_init_p (stmt))
 	changed = defs_to_varying (stmt);
       else if (is_gimple_assign (stmt))
 	{
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 2/8] Add pass_oacc_kernels
  2014-11-15 17:22 ` [PATCH, 2/8] Add pass_oacc_kernels Tom de Vries
@ 2014-11-25 11:31   ` Tom de Vries
  2015-04-21 19:46     ` Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-25 11:31 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]

On 15-11-14 18:20, Tom de Vries wrote:
> On 15-11-14 13:14, Tom de Vries wrote:
>> Hi,
>>
>> I'm submitting a patch series with initial support for the oacc kernels
>> directive.
>>
>> The patch series uses pass_parallelize_loops to implement parallelization of
>> loops in the oacc kernels region.
>>
>> The patch series consists of these 8 patches:
>> ...
>>      1  Expand oacc kernels after pass_build_ealias
>>      2  Add pass_oacc_kernels
>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>      5  Add pass_loop_im to pass_oacc_kernels
>>      6  Add pass_ccp to pass_oacc_kernels
>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>      8  Do simple omp lowering for no address taken var
>> ...
>
> This patch adds a pass group pass_oacc_kernels.
>
> The rationale is that we want a pass group to run oacc kernels region related
> (optimization) passes in.
>

Updated for moving pass_oacc_kernels down past pass_fre in the pass list.

Bootstrapped and reg-tested as before.

OK for trunk?

Thanks,
   - Tom


[-- Attachment #2: 0002-Add-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 3112 bytes --]

[PATCH 2/7] Add pass_oacc_kernels

2014-11-25  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass group pass_oacc_kernels.
	* tree-pass.h (make_pass_oacc_kernels): Declare.
	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
	(pass_data_oacc_kernels): New pass_data.
	(class pass_oacc_kernels): New pass.
	(make_pass_oacc_kernels): New function.
---
 gcc/passes.def      |  7 ++++++-
 gcc/tree-pass.h     |  1 +
 gcc/tree-ssa-loop.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index bf1cd34..efb3d8c 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -86,7 +86,12 @@ along with GCC; see the file COPYING3.  If not see
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
 	  NEXT_PASS (pass_fre);
-	  NEXT_PASS (pass_expand_omp_ssa);
+	  /* Pass group that runs when there are oacc kernels in the
+	     function.  */
+	  NEXT_PASS (pass_oacc_kernels);
+	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+	      NEXT_PASS (pass_expand_omp_ssa);
+	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
 	  NEXT_PASS (pass_early_ipa_sra);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 75f8aa5..d63ab2b 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -449,6 +449,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
 
 /* IPA Passes */
 extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 758b5fc..c29aa22 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -157,6 +157,54 @@ make_pass_tree_loop (gcc::context *ctxt)
   return new pass_tree_loop (ctxt);
 }
 
+/* Gate for oacc kernels pass group.  */
+
+static bool
+gate_oacc_kernels (function *fn)
+{
+  if (!flag_openacc)
+    return false;
+
+  return fn->contains_oacc_kernels;
+}
+
+/* The oacc kernels superpass.  */
+
+namespace {
+
+const pass_data pass_data_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fn) { return gate_oacc_kernels (fn); }
+
+}; // class pass_oacc_kernels
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_oacc_kernels (ctxt);
+}
+
 /* The no-loop superpass.  */
 
 namespace {
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2014-11-15 17:23 ` [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels Tom de Vries
@ 2014-11-25 11:39   ` Tom de Vries
  2015-04-21 19:49     ` Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-25 11:39 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1404 bytes --]

On 15-11-14 18:21, Tom de Vries wrote:
> On 15-11-14 13:14, Tom de Vries wrote:
>> Hi,
>>
>> I'm submitting a patch series with initial support for the oacc kernels
>> directive.
>>
>> The patch series uses pass_parallelize_loops to implement parallelization of
>> loops in the oacc kernels region.
>>
>> The patch series consists of these 8 patches:
>> ...
>>      1  Expand oacc kernels after pass_build_ealias
>>      2  Add pass_oacc_kernels
>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>      5  Add pass_loop_im to pass_oacc_kernels
>>      6  Add pass_ccp to pass_oacc_kernels
>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>      8  Do simple omp lowering for no address taken var
>> ...
>
> This patch adds a pass_ch_oacc_kernels to the pass group pass_oacc_kernels.
>
> The idea is that pass_parallelize_loops only deals with loops for which the
> header has been copied, so the easiest way to meet that requirement when running
> pass_parallelize_loops in group pass_oacc_kernels, is to run pass_ch as a part
> of pass_oacc_kernels.
>
> We define a seperate pass pass_ch_oacc_kernels, to leave all loops that aren't
> part of a kernels region alone.
>

Updated for moving pass_oacc_kernels down past pass_fre in the pass list.

Bootstrapped and reg-tested as before.

OK for trunk?

Thanks,
   - Tom

[-- Attachment #2: 0003-Add-pass_ch_oacc_kernels-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 7612 bytes --]

[PATCH 3/7] Add pass_ch_oacc_kernels to pass_oacc_kernels

2014-11-25  Tom de Vries  <tom@codesourcery.com>

	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
	* tree-ssa-loop-ch.c: Include omp-low.h.
	(pass_ch_execute): Declare.
	(pass_ch::execute): Factor out ...
	(pass_ch_execute): ... this new function.  If handling oacc kernels,
	skip loops that are not in oacc kernels region.
	(pass_ch_oacc_kernels::execute):
	(pass_data_ch_oacc_kernels): New pass_data.
	(class pass_ch_oacc_kernels): New pass.
	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
	function.
---
 gcc/omp-low.c          | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/omp-low.h          |  2 ++
 gcc/passes.def         |  1 +
 gcc/tree-pass.h        |  1 +
 gcc/tree-ssa-loop-ch.c | 59 +++++++++++++++++++++++++++++++++--
 5 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3ac546c..543dd48 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13912,4 +13912,87 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
 						   SSA_OP_DEF);
 }
 
+/* Return true if LOOP is inside a kernels region.  */
+
+bool
+loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry,
+			       basic_block *region_exit)
+{
+  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap region_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap_clear (region_bitmap);
+
+  if (region_entry != NULL)
+    *region_entry = NULL;
+  if (region_exit != NULL)
+    *region_exit = NULL;
+
+  basic_block bb;
+  gimple last;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      if (bitmap_bit_p (region_bitmap, bb->index))
+	continue;
+
+      last = last_stmt (bb);
+      if (!last)
+	continue;
+
+      if (gimple_code (last) != GIMPLE_OACC_KERNELS)
+	continue;
+
+      bitmap_clear (excludes_bitmap);
+      bitmap_set_bit (excludes_bitmap, bb->index);
+
+      vec<basic_block> dominated
+	= get_all_dominated_blocks (CDI_DOMINATORS, bb);
+
+      unsigned di;
+      basic_block dom;
+
+      basic_block end_region = NULL;
+      FOR_EACH_VEC_ELT (dominated, di, dom)
+	{
+	  if (dom == bb)
+	    continue;
+
+	  last = last_stmt (dom);
+	  if (!last)
+	    continue;
+
+	  if (gimple_code (last) != GIMPLE_OMP_RETURN)
+	    continue;
+
+	  if (end_region == NULL
+	      || dominated_by_p (CDI_DOMINATORS, end_region, dom))
+	    end_region = dom;
+	}
+
+      vec<basic_block> excludes
+	= get_all_dominated_blocks (CDI_DOMINATORS, end_region);
+
+      unsigned di2;
+      basic_block exclude;
+
+      FOR_EACH_VEC_ELT (excludes, di2, exclude)
+	if (exclude != end_region)
+	  bitmap_set_bit (excludes_bitmap, exclude->index);
+
+      FOR_EACH_VEC_ELT (dominated, di, dom)
+	if (!bitmap_bit_p (excludes_bitmap, dom->index))
+	  bitmap_set_bit (region_bitmap, dom->index);
+
+      if (bitmap_bit_p (region_bitmap, loop->header->index))
+	{
+	  if (region_entry != NULL)
+	    *region_entry = bb;
+	  if (region_exit != NULL)
+	    *region_exit = end_region;
+	  return true;
+	}
+    }
+
+  return false;
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index 32076e4..30df867 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -29,6 +29,8 @@ extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
 extern bool gimple_stmt_omp_data_i_init_p (gimple);
+extern bool loop_in_oacc_kernels_region_p (struct loop *, basic_block *,
+					   basic_block *);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git a/gcc/passes.def b/gcc/passes.def
index efb3d8c..01368bb 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -90,6 +90,7 @@ along with GCC; see the file COPYING3.  If not see
 	     function.  */
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+	      NEXT_PASS (pass_ch_oacc_kernels);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_merge_phi);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index d63ab2b..dd1f308 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -378,6 +378,7 @@ extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_ch_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 300b2fa..8f91552 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -48,12 +48,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "flags.h"
 #include "tree-ssa-threadedge.h"
+#include "omp-low.h"
 
 /* Duplicates headers of loops if they are small enough, so that the statements
    in the loop body are always executed when the loop is entered.  This
    increases effectiveness of code motion optimizations, and reduces the need
    for loop preconditioning.  */
 
+static unsigned int pass_ch_execute (function *, bool);
+
 /* Check whether we should duplicate HEADER of LOOP.  At most *LIMIT
    instructions should be duplicated, limit is decreased by the actual
    amount.  */
@@ -172,6 +175,14 @@ public:
 unsigned int
 pass_ch::execute (function *fun)
 {
+  return pass_ch_execute (fun, false);
+}
+
+} // anon namespace
+
+static unsigned int
+pass_ch_execute (function *fun, bool oacc_kernels_p)
+{
   struct loop *loop;
   basic_block header;
   edge exit, entry;
@@ -205,6 +216,10 @@ pass_ch::execute (function *fun)
       if (do_while_loop_p (loop))
 	continue;
 
+      if (oacc_kernels_p
+	  && !loop_in_oacc_kernels_region_p (loop, NULL, NULL))
+	continue;
+
       /* Iterate the header copying up to limit; this takes care of the cases
 	 like while (a && b) {...}, where we want to have both of the conditions
 	 copied.  TODO -- handle while (a || b) - like cases, by not requiring
@@ -295,10 +310,50 @@ pass_ch::execute (function *fun)
   return 0;
 }
 
-} // anon namespace
-
 gimple_opt_pass *
 make_pass_ch (gcc::context *ctxt)
 {
   return new pass_ch (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_ch_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "ch_oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_CH, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+ class pass_ch_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_ch_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_ch_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_ch_oacc_kernels
+
+unsigned int
+pass_ch_oacc_kernels::execute (function *fun)
+{
+  return pass_ch_execute (fun, true);
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_ch_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_ch_oacc_kernels (ctxt);
+}
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
  2014-11-15 17:23 ` [PATCH, 4/8] Add pass_tree_loop_{init,done} " Tom de Vries
@ 2014-11-25 11:42   ` Tom de Vries
  2015-04-21 19:52     ` Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-25 11:42 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]

On 15-11-14 18:21, Tom de Vries wrote:
> On 15-11-14 13:14, Tom de Vries wrote:
>> Hi,
>>
>> I'm submitting a patch series with initial support for the oacc kernels
>> directive.
>>
>> The patch series uses pass_parallelize_loops to implement parallelization of
>> loops in the oacc kernels region.
>>
>> The patch series consists of these 8 patches:
>> ...
>>      1  Expand oacc kernels after pass_build_ealias
>>      2  Add pass_oacc_kernels
>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>      5  Add pass_loop_im to pass_oacc_kernels
>>      6  Add pass_ccp to pass_oacc_kernels
>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>      8  Do simple omp lowering for no address taken var
>> ...
>
> This patch adds pass_tree_loop_init and pass_tree_loop_init_done to
> pass_oacc_kernels.
>
> Pass_parallelize_loops is run between these passes in the pass group
> pass_tree_loop, since it requires loop information.  We do the same for
> pass_oacc_kernels.
>

Updated for moving pass_oacc_kernels down past pass_fre in the pass list.

Bootstrapped and reg-tested as before.

OK for trunk?

Thanks,
   - Tom

[-- Attachment #2: 0004-Add-pass_tree_loop_-init-done-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 1519 bytes --]

[PATCH 4/7] Add pass_tree_loop_{init,done} to pass_oacc_kernels

2014-11-25  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
	group pass_oacc_kernels.
	* tree-ssa-loop.c (pass_tree_loop_init::clone)
	(pass_tree_loop_done::clone): New function.
---
 gcc/passes.def      | 2 ++
 gcc/tree-ssa-loop.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/passes.def b/gcc/passes.def
index 01368bb..37e08a8 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -91,7 +91,9 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
+	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_expand_omp_ssa);
+	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index c29aa22..c78b013 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -269,6 +269,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
 
 }; // class pass_tree_loop_init
 
@@ -563,6 +564,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }
+  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
 
 }; // class pass_tree_loop_done
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 5/8] Add pass_loop_im to pass_oacc_kernels
  2014-11-15 17:24 ` [PATCH, 5/8] Add pass_loop_im " Tom de Vries
@ 2014-11-25 12:00   ` Tom de Vries
  2015-04-21 19:57     ` [PATCH, 5/8] Add pass_lim " Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-25 12:00 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

On 15-11-14 18:22, Tom de Vries wrote:
> On 15-11-14 13:14, Tom de Vries wrote:
>> Hi,
>>
>> I'm submitting a patch series with initial support for the oacc kernels
>> directive.
>>
>> The patch series uses pass_parallelize_loops to implement parallelization of
>> loops in the oacc kernels region.
>>
>> The patch series consists of these 8 patches:
>> ...
>>      1  Expand oacc kernels after pass_build_ealias
>>      2  Add pass_oacc_kernels
>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>      5  Add pass_loop_im to pass_oacc_kernels
>>      6  Add pass_ccp to pass_oacc_kernels
>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>      8  Do simple omp lowering for no address taken var
>> ...
>
> This patch adds pass_loop_im to pass group pass_oacc_kernels.
>
> We need this pass to simplify the loop body, and allow pass_parloops to detect
> that loop iterations are independent.
>

Updated for moving pass_oacc_kernels down past pass_fre in the pass list.

Bootstrapped and reg-tested as before.

OK for trunk?

Thanks,
   - Tom

[-- Attachment #2: 0005-Add-pass_loop_im-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 23958 bytes --]

[PATCH 5/7] Add pass_loop_im to pass_oacc_kernels

2014-11-25  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass_lim in pass group pass_ch_oacc_kernels.

	* c-c++-common/restrict-2.c: Update for new pass_lim.
	* c-c++-common/restrict-4.c: Same.
	* g++.dg/tree-ssa/pr33615.C:  Same.
	* g++.dg/tree-ssa/restrict1.C: Same.
	* gcc.dg/tm/pub-safety-1.c:  Same.
	* gcc.dg/tm/reg-promotion.c:  Same.
	* gcc.dg/tree-ssa/20050314-1.c:  Same.
	* gcc.dg/tree-ssa/loop-32.c: Same.
	* gcc.dg/tree-ssa/loop-33.c: Same.
	* gcc.dg/tree-ssa/loop-34.c: Same.
	* gcc.dg/tree-ssa/loop-35.c: Same.
	* gcc.dg/tree-ssa/loop-7.c: Same.
	* gcc.dg/tree-ssa/pr23109.c: Same.
	* gcc.dg/tree-ssa/restrict-3.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-1.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-10.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-11.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-12.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-2.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-3.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-6.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-7.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-8.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-9.c: Same.
	* gcc.dg/tree-ssa/structopt-1.c: Same.
	* gfortran.dg/pr32921.f: Same.
---
 gcc/passes.def                              | 1 +
 gcc/testsuite/c-c++-common/restrict-2.c     | 6 +++---
 gcc/testsuite/c-c++-common/restrict-4.c     | 6 +++---
 gcc/testsuite/g++.dg/tree-ssa/pr33615.C     | 6 +++---
 gcc/testsuite/g++.dg/tree-ssa/restrict1.C   | 6 +++---
 gcc/testsuite/gcc.dg/tm/pub-safety-1.c      | 6 +++---
 gcc/testsuite/gcc.dg/tm/reg-promotion.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-32.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-33.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-34.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-35.c     | 8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/loop-7.c      | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/pr23109.c     | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c  | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c   | 8 ++++----
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c | 6 +++---
 gcc/testsuite/gfortran.dg/pr32921.f         | 6 +++---
 27 files changed, 81 insertions(+), 80 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 37e08a8..438d292 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -92,6 +92,7 @@ along with GCC; see the file COPYING3.  If not see
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
 	      NEXT_PASS (pass_tree_loop_init);
+	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
diff --git a/gcc/testsuite/c-c++-common/restrict-2.c b/gcc/testsuite/c-c++-common/restrict-2.c
index 3f71b77..f0b0e15a 100644
--- a/gcc/testsuite/c-c++-common/restrict-2.c
+++ b/gcc/testsuite/c-c++-common/restrict-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim2-details" } */
 
 void foo (float * __restrict__ a, float * __restrict__ b, int n, int j)
 {
@@ -10,5 +10,5 @@ void foo (float * __restrict__ a, float * __restrict__ b, int n, int j)
 
 /* We should move the RHS of the store out of the loop.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 11 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 11 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/c-c++-common/restrict-4.c b/gcc/testsuite/c-c++-common/restrict-4.c
index 3a36def..f791533 100644
--- a/gcc/testsuite/c-c++-common/restrict-4.c
+++ b/gcc/testsuite/c-c++-common/restrict-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile }  */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 struct Foo
 {
@@ -15,5 +15,5 @@ void bar(struct Foo f, int * __restrict__ q)
     }
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr33615.C b/gcc/testsuite/g++.dg/tree-ssa/pr33615.C
index 801b334..2591e00 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr33615.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr33615.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fnon-call-exceptions -fdump-tree-lim1-details -w" } */
+/* { dg-options "-O -fnon-call-exceptions -fdump-tree-lim2-details -w" } */
 
 extern volatile int y;
 
@@ -16,5 +16,5 @@ foo (double a, int x)
 
 // The expression 1.0 / 0.0 should not be treated as a loop invariant
 // if it may throw an exception.
-// { dg-final { scan-tree-dump-times "invariant up to" 0 "lim1" } }
-// { dg-final { cleanup-tree-dump "lim1" } }
+// { dg-final { scan-tree-dump-times "invariant up to" 0 "lim2" } }
+// { dg-final { cleanup-tree-dump "lim2" } }
diff --git a/gcc/testsuite/g++.dg/tree-ssa/restrict1.C b/gcc/testsuite/g++.dg/tree-ssa/restrict1.C
index 682de7e..761e7e2 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/restrict1.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/restrict1.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 struct Foo
 {
@@ -16,5 +16,5 @@ void bar(Foo f, int * __restrict__ q)
     }
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tm/pub-safety-1.c b/gcc/testsuite/gcc.dg/tm/pub-safety-1.c
index 660e9a6..6d99410 100644
--- a/gcc/testsuite/gcc.dg/tm/pub-safety-1.c
+++ b/gcc/testsuite/gcc.dg/tm/pub-safety-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */
+/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim2" } */
 
 /* Test that thread visible loads do not get hoisted out of loops if
    the load would not have occurred on each path out of the loop.  */
@@ -20,5 +20,5 @@ void reader()
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tm/reg-promotion.c b/gcc/testsuite/gcc.dg/tm/reg-promotion.c
index e48bfb2..f1d2387 100644
--- a/gcc/testsuite/gcc.dg/tm/reg-promotion.c
+++ b/gcc/testsuite/gcc.dg/tm/reg-promotion.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim1" } */
+/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim2" } */
 
 /* Test that `count' is not written to unless p->data>0.  */
 
@@ -20,5 +20,5 @@ void func()
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Cannot hoist conditional load of count because it is in a transaction" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Cannot hoist conditional load of count because it is in a transaction" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c b/gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
index 8f07781..7f2e477 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-lim1-details --param allow-store-data-races=1" } */
+/* { dg-options "-O1 -fdump-tree-lim2-details --param allow-store-data-races=1" } */
 
 float a[100];
 
@@ -17,5 +17,5 @@ void xxx (void)
 /* Store motion may be applied to the assignment to a[k], since sinf
    cannot read nor write the memory.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-32.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
index f0c8d30..30b9d72 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -42,5 +42,5 @@ void test3(struct a *A)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-33.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
index bf16b13..281b336 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -36,5 +36,5 @@ void test5(struct a *A, unsigned b)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 4 "lim1" { xfail { lp64 || llp64 } } } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 4 "lim2" { xfail { lp64 || llp64 } } } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-34.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
index 125a220..e0ec9cf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int r[6];
 
@@ -17,5 +17,5 @@ void f (int n)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of r" 6 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of r" 6 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-35.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
index 2d2db70..5a1e875 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -67,6 +67,6 @@ void test4(struct a *A, unsigned LONG b)
     }
 }
 /* long index not hoisted for avr target PR 36561 */
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 8 "lim1" { xfail { "avr-*-*" } } } } */
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 6 "lim1" { target { "avr-*-*" } } } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 8 "lim2" { xfail { "avr-*-*" } } } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 6 "lim2" { target { "avr-*-*" } } } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-7.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
index 38e19e6..4e83170 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/19828 */
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-lim1-details" } */
+/* { dg-options "-O1 -fdump-tree-lim2-details" } */
 
 int cst_fun1 (int) __attribute__((__const__));
 int cst_fun2 (int) __attribute__((__const__));
@@ -31,5 +31,5 @@ int xxx (void)
    Calls to cst_fun2 and pure_fun2 should not be, since calling
    with k = 0 may be invalid.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr23109.c b/gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
index 73fd84d..0f92311 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -funsafe-math-optimizations -ftrapping-math -fdump-tree-recip -fdump-tree-lim1" } */
+/* { dg-options "-O2 -funsafe-math-optimizations -ftrapping-math -fdump-tree-recip -fdump-tree-lim2" } */
 /* { dg-warning "-fassociative-math disabled" "" { target *-*-* } 1 } */
 
 double F[2] = { 0., 0. }, e = 0.;
@@ -29,8 +29,8 @@ int main()
 /* LIM only performs the transformation in the no-trapping-math case.  In
    the future we will do it for trapping-math as well in recip, check that
    this is not wrongly optimized.  */
-/* { dg-final { scan-tree-dump-not "reciptmp" "lim1" } } */
+/* { dg-final { scan-tree-dump-not "reciptmp" "lim2" } } */
 /* { dg-final { scan-tree-dump-not "reciptmp" "recip" } } */
 /* { dg-final { cleanup-tree-dump "recip" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
index 95cc1a2..c3ca462 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim2-details" } */
 
 void f(int * __restrict__ r,
        int a[__restrict__ 16][16],
@@ -14,5 +14,5 @@ void f(int * __restrict__ r,
 
 /* We should apply store motion to the store to *r.  */
 
-/* { dg-final { scan-tree-dump "Executing store motion of \\\*r" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of \\\*r" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
index 3952a9a..0b22fc3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 /* This is a variant that does cause fold to place a cast to
    int before testing bit 1.  */
@@ -18,5 +18,5 @@ quantum_toffoli (int control1, int control2, int target,
     }
 }
 
-/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
index bc14926..4a218e0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int *l, *r;
 int test_func(void)
@@ -27,5 +27,5 @@ int test_func(void)
   return i;
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of pos" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of pos" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
index ea91a61..7315025 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fprofile-arcs -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fprofile-arcs -fdump-tree-lim2-details" } */
 
 struct thread_param
 {
@@ -21,5 +21,5 @@ void access_buf(struct thread_param* p)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of __gcov0.access_buf\\\[\[01\]\\\] from loop 1" 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of __gcov0.access_buf\\\[\[01\]\\\] from loop 1" 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
index e0d93a9..07855bb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 int a[1024];
 
@@ -23,5 +23,5 @@ void bar (int x, int z)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "!= 0 ? " 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "!= 0 ? " 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
index 2106b62..652d1ba 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 /* This is a variant that doesn't cause fold to place a cast to
    int before testing bit 1.  */
@@ -18,5 +18,5 @@ int size)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
index a81857c..29539fa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 struct { int x; int y; } global;
 void foo(int n)
@@ -9,6 +9,6 @@ void foo(int n)
     global.y += global.x*global.x;
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of global.y" "lim1" } } */
-/* { dg-final { scan-tree-dump "Moving statement.*global.x.*out of loop 1" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of global.y" "lim2" } } */
+/* { dg-final { scan-tree-dump "Moving statement.*global.x.*out of loop 1" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
index 100a230..a70bb2e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 double a[16][64], y[64], x[16];
 void foo(void)
@@ -10,5 +10,5 @@ void foo(void)
       y[j] = y[j] + a[i][j] * x[i];
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of y" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of y" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
index f8e15f3..6a67234 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 extern const int srcshift;
 
@@ -11,5 +11,5 @@ void foo (int *srcdata, int *dstdata)
     dstdata[i] = srcdata[i] << srcshift;
 }
 
-/* { dg-final { scan-tree-dump "Moving statement" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Moving statement" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
index 551b68f..c6f56ec 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 void bar (int);
 void foo (int n, int m)
@@ -16,5 +16,5 @@ void foo (int n, int m)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim1"  } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim2"  } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
index c5a6765..2233c90 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 void bar (int);
 void foo (int n, int m)
@@ -16,5 +16,5 @@ void foo (int n, int m)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim1"  } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim2"  } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c b/gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
index e5fe291..54cf44c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 int x; int y;
 struct { int x; int y; } global;
 int foo() {
@@ -10,6 +10,6 @@ int foo() {
 		global.y += global.x*global.x;
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of global.y" 1 "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of global.y" 1 "lim2" } } */
 /* XXX: We should also check for the load motion of global.x, but there is no easy way to do this.  */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git a/gcc/testsuite/gfortran.dg/pr32921.f b/gcc/testsuite/gfortran.dg/pr32921.f
index 45ea647..55b5604 100644
--- a/gcc/testsuite/gfortran.dg/pr32921.f
+++ b/gcc/testsuite/gfortran.dg/pr32921.f
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O2 -fdump-tree-lim1" }
+! { dg-options "-O2 -fdump-tree-lim2" }
 ! gfortran -c -m32 -O2 -S junk.f
 !
       MODULE LES3D_DATA
@@ -45,5 +45,5 @@
 
       RETURN
       END
-! { dg-final { scan-tree-dump-times "stride" 4 "lim1" } }
-! { dg-final { cleanup-tree-dump "lim1" } }
+! { dg-final { scan-tree-dump-times "stride" 4 "lim2" } }
+! { dg-final { cleanup-tree-dump "lim2" } }
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 6/8] Add pass_ccp to pass_oacc_kernels
  2014-11-15 18:32 ` [PATCH, 6/8] Add pass_ccp " Tom de Vries
@ 2014-11-25 12:03   ` Tom de Vries
  2015-04-21 20:01     ` [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-25 12:03 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1258 bytes --]

On 15-11-14 18:22, Tom de Vries wrote:
> On 15-11-14 13:14, Tom de Vries wrote:
>> Hi,
>>
>> I'm submitting a patch series with initial support for the oacc kernels
>> directive.
>>
>> The patch series uses pass_parallelize_loops to implement parallelization of
>> loops in the oacc kernels region.
>>
>> The patch series consists of these 8 patches:
>> ...
>>      1  Expand oacc kernels after pass_build_ealias
>>      2  Add pass_oacc_kernels
>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>      5  Add pass_loop_im to pass_oacc_kernels
>>      6  Add pass_ccp to pass_oacc_kernels
>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>      8  Do simple omp lowering for no address taken var
>> ...
>
> This patch adds pass_loop_ccp to pass group pass_oacc_kernels.
>
> We need this pass to simplify the loop body, and allow pass_parloops to detect
> that loop iterations are independent.
>

As suggested here ( https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html ) 
I've replaced the pass_ccp with pass_copyprop, which performs trivial constant 
propagation in addition to copy propagation.

Bootstrapped and reg-tested as before.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0006-Add-pass_copy_prop-in-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 1510 bytes --]

[PATCH 6/7] Add pass_copy_prop in pass_oacc_kernels

2014-11-25  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
	* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
	conservatively.
---
 gcc/passes.def      | 1 +
 gcc/tree-ssa-copy.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/gcc/passes.def b/gcc/passes.def
index 438d292..fb0d331 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -93,6 +93,7 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_ch_oacc_kernels);
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
+	      NEXT_PASS (pass_copy_prop);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
diff --git a/gcc/tree-ssa-copy.c b/gcc/tree-ssa-copy.c
index 7c22c5e..d6eb7a7 100644
--- a/gcc/tree-ssa-copy.c
+++ b/gcc/tree-ssa-copy.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-dom.h"
 #include "tree-ssa-loop-niter.h"
+#include "omp-low.h"
 
 
 /* This file implements the copy propagation pass and provides a
@@ -110,6 +111,9 @@ stmt_may_generate_copy (gimple stmt)
   if (gimple_has_volatile_ops (stmt))
     return false;
 
+  if (gimple_stmt_omp_data_i_init_p (stmt))
+    return false;
+
   /* Statements with loads and/or stores will never generate a useful copy.  */
   if (gimple_vuse (stmt))
     return false;
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels
  2014-11-15 18:52 ` [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels Tom de Vries
@ 2014-11-25 12:15   ` Tom de Vries
  2015-04-21 20:09     ` [PATCH, 7/8] Add pass_parallelize_loops_oacc_kernels " Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2014-11-25 12:15 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1759 bytes --]

On 15-11-14 18:23, Tom de Vries wrote:
> On 15-11-14 13:14, Tom de Vries wrote:
>> Hi,
>>
>> I'm submitting a patch series with initial support for the oacc kernels
>> directive.
>>
>> The patch series uses pass_parallelize_loops to implement parallelization of
>> loops in the oacc kernels region.
>>
>> The patch series consists of these 8 patches:
>> ...
>>      1  Expand oacc kernels after pass_build_ealias
>>      2  Add pass_oacc_kernels
>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>      5  Add pass_loop_im to pass_oacc_kernels
>>      6  Add pass_ccp to pass_oacc_kernels
>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>      8  Do simple omp lowering for no address taken var
>> ...
>
> This patch adds:
> - a specialized version of pass_parallelize_loops called
>      pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and
> - relevant test-cases.
>
> The pass only handles loops that are in a kernels region, and skips over bits of
> pass_parallelize_loops that are already done for oacc kernels.
>
> The pass reintroduces the use of omp_expand_local, I haven't managed to make it
> work yet using the external pass pass_expand_omp_ssa.
>
> An obvious limitation of the patch is the fact that we copy over the clauses
> from the kernels directive to the generated parallel directive. We'll need to do
> something more intelligent here, f.i. setting vector_length based on the
> parallelization factor.
>
> Another limitation is that the pass still needs -ftree-parallelize-loops to
> trigger.
>

Updated for using pass_copyprop instead of pass_ccp in pass_oacc_kernels.

Bootstrapped and reg-tested as before.

OK for trunk?

Thanks,
- Tom


[-- Attachment #2: 0007-Add-pass_parloops_oacc_kernels-to-pass_oacc_kernels.patch --]
[-- Type: text/x-patch, Size: 23004 bytes --]

[PATCH 7/7] Add pass_parloops_oacc_kernels to pass_oacc_kernels

2014-11-25  Tom de Vries  <tom@codesourcery.com>

	* passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
	pass_oacc_kernels.  Move pass_expand_omp_ssa into pass group
	pass_oacc_kernels.
	* tree-parloops.c (create_parallel_loop): Add function parameters
	region_entry and bool oacc_kernels_p.  Handle oacc_kernels_p.
	(gen_parallel_loop): Same.  Use omp_expand_local if oacc_kernels_p.
	Call create_parallel_loop with additional args.
	(parallelize_loops): Add function parameter oacc_kernels_p.  Calculate
	dominance info.  Skip loops that are not in a kernels region. Call
	gen_parallel_loop with additional args.
	(pass_parallelize_loops::execute): Call parallelize_loops with false
	argument.
	(pass_data_parallelize_loops_oacc_kernels): New pass_data.
	(class pass_parallelize_loops_oacc_kernels): New pass.
	(pass_parallelize_loops_oacc_kernels::execute)
	(make_pass_parallelize_loops_oacc_kernels): New function.
	* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.

	* testsuite/libgomp.oacc-c/oacc-kernels-2-run.c: New test.
	* testsuite/libgomp.oacc-c/oacc-kernels-run.c: New test.

	* gcc.dg/oacc-kernels-2.c: New test.
	* gcc.dg/oacc-kernels.c: New test.
---
 gcc/passes.def                                     |   1 +
 gcc/testsuite/gcc.dg/oacc-kernels-2.c              |  79 +++++++
 gcc/testsuite/gcc.dg/oacc-kernels.c                |  71 ++++++
 gcc/tree-parloops.c                                | 242 ++++++++++++++++-----
 gcc/tree-pass.h                                    |   2 +
 .../testsuite/libgomp.oacc-c/oacc-kernels-2-run.c  |  65 ++++++
 .../testsuite/libgomp.oacc-c/oacc-kernels-run.c    |  59 +++++
 7 files changed, 464 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels-2.c
 create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c

diff --git a/gcc/passes.def b/gcc/passes.def
index fb0d331..d91283b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_copy_prop);
+      	      NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
diff --git a/gcc/testsuite/gcc.dg/oacc-kernels-2.c b/gcc/testsuite/gcc.dg/oacc-kernels-2.c
new file mode 100644
index 0000000..1ff4bad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/oacc-kernels-2.c
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenacc } */
+/* { dg-options "-fopenacc -ftree-parallelize-loops=32 -O2 -std=c99 -fdump-tree-parloops_oacc_kernels-all -fdump-tree-copyrename" } */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+#pragma acc kernels copyout (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc kernels copyout (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  {
+    unsigned int sum = 0;
+
+    for (COUNTERTYPE i = 0; i < N; i++)
+      sum += c[i];
+
+    printf ("sum: %u\n", sum);
+
+    if (sum != N_REF)
+      abort ();
+  }
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only three loops are analyzed, and that all can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  It pops up first in
+   all_passes/pass_all_optimizations/pass_rename_ssa_copies.  */
+/* { dg-final { scan-tree-dump-times "Function main._omp_fn.0 " 1 "copyrename2" } } */
+/* { dg-final { scan-tree-dump-times "Function main._omp_fn.1 " 1 "copyrename2" } } */
+/* { dg-final { scan-tree-dump-times "Function main._omp_fn.2 " 1 "copyrename2" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "copyrename*" } } */
diff --git a/gcc/testsuite/gcc.dg/oacc-kernels.c b/gcc/testsuite/gcc.dg/oacc-kernels.c
new file mode 100644
index 0000000..de94aa9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/oacc-kernels.c
@@ -0,0 +1,71 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenacc } */
+/* { dg-options "-fopenacc -ftree-parallelize-loops=32 -O2 -std=c99 -fdump-tree-parloops_oacc_kernels-all -fdump-tree-copyrename" } */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  {
+    unsigned int sum = 0;
+
+    for (COUNTERTYPE i = 0; i < N; i++)
+      sum += c[i];
+
+    printf ("sum: %u\n", sum);
+
+    if (sum != N_REF)
+      abort ();
+  }
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  It pops up first in
+   all_passes/pass_all_optimizations/pass_rename_ssa_copies.  */
+/* { dg-final { scan-tree-dump-times "Function main._omp_fn.0 " 1 "copyrename2" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "copyrename*" } } */
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e5dca78..7bc945b 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1611,7 +1611,8 @@ transform_to_exit_first_loop (struct loop *loop,
 
 static basic_block
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
-		      tree new_data, unsigned n_threads, location_t loc)
+		      tree new_data, unsigned n_threads, location_t loc,
+		      basic_block region_entry, bool oacc_kernels_p)
 {
   gimple_stmt_iterator gsi;
   basic_block bb, paral_bb, for_bb, ex_bb;
@@ -1623,15 +1624,44 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   /* Prepare the GIMPLE_OMP_PARALLEL statement.  */
   bb = loop_preheader_edge (loop)->src;
   paral_bb = single_pred (bb);
-  gsi = gsi_last_bb (paral_bb);
+  if (!oacc_kernels_p)
+    gsi = gsi_last_bb (paral_bb);
+  else
+    /* Make sure the oacc parallel is inserted on top of the oacc kernels
+       region.  */
+    gsi = gsi_last_bb (region_entry);
 
-  t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
-  OMP_CLAUSE_NUM_THREADS_EXPR (t)
-    = build_int_cst (integer_type_node, n_threads);
-  stmt = gimple_build_omp_parallel (NULL, t, loop_fn, data);
-  gimple_set_location (stmt, loc);
+  if (!oacc_kernels_p)
+    {
+      t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
+      OMP_CLAUSE_NUM_THREADS_EXPR (t)
+	= build_int_cst (integer_type_node, n_threads);
+      stmt = gimple_build_omp_parallel (NULL, t, loop_fn, data);
+      gimple_set_location (stmt, loc);
 
-  gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+    }
+  else
+    {
+      /* Create oacc parallel pragma based on oacc kernels pragma.  */
+      gimple kernels = last_stmt (region_entry);
+      stmt = gimple_build_oacc_parallel (NULL,
+					 gimple_oacc_kernels_clauses (kernels));
+      tree child_fn = gimple_oacc_kernels_child_fn (kernels);
+      gimple_oacc_parallel_set_child_fn (stmt, child_fn);
+      tree data_arg = gimple_oacc_kernels_data_arg (kernels);
+      gimple_oacc_parallel_set_data_arg (stmt, data_arg);
+
+      gimple_set_location (stmt, loc);
+
+      /* Insert oacc parallel pragma after the oacc kernels pragma.  */
+      {
+	gimple_stmt_iterator gsi2;
+	gsi2 = gsi;
+	gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	gsi_remove (&gsi2, true);
+      }
+    }
 
   /* Initialize NEW_DATA.  */
   if (data)
@@ -1647,12 +1677,18 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
       gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
     }
 
-  /* Emit GIMPLE_OMP_RETURN for GIMPLE_OMP_PARALLEL.  */
-  bb = split_loop_exit_edge (single_dom_exit (loop));
-  gsi = gsi_last_bb (bb);
-  stmt = gimple_build_omp_return (false);
-  gimple_set_location (stmt, loc);
-  gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+  /* Skip insertion of OMP_RETURN for oacc_kernels_p.  We've already generated
+     one when lowering the oacc kernels directive in
+     pass_lower_omp/lower_omp (). */
+  if (!oacc_kernels_p)
+    {
+      /* Emit GIMPLE_OMP_RETURN for GIMPLE_OMP_PARALLEL.  */
+      bb = split_loop_exit_edge (single_dom_exit (loop));
+      gsi = gsi_last_bb (bb);
+      stmt = gimple_build_omp_return (false);
+      gimple_set_location (stmt, loc);
+      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+    }
 
   /* Extract data for GIMPLE_OMP_FOR.  */
   gcc_assert (loop->header == single_dom_exit (loop)->src);
@@ -1705,7 +1741,11 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   t = build_omp_clause (loc, OMP_CLAUSE_SCHEDULE);
   OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
 
-  for_stmt = gimple_build_omp_for (NULL, GF_OMP_FOR_KIND_FOR, t, 1, NULL);
+  for_stmt = gimple_build_omp_for (NULL,
+				   (oacc_kernels_p
+				    ? GF_OMP_FOR_KIND_OACC_LOOP
+				    : GF_OMP_FOR_KIND_FOR),
+				   NULL_TREE, 1, NULL);
   gimple_set_location (for_stmt, loc);
   gimple_omp_for_set_index (for_stmt, 0, initvar);
   gimple_omp_for_set_initial (for_stmt, 0, cvar_init);
@@ -1736,7 +1776,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   free_dominance_info (CDI_DOMINATORS);
   calculate_dominance_info (CDI_DOMINATORS);
 
-  return paral_bb;
+  return oacc_kernels_p ? region_entry : paral_bb;
 }
 
 /* Generates code to execute the iterations of LOOP in N_THREADS
@@ -1748,11 +1788,13 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 static void
 gen_parallel_loop (struct loop *loop,
 		   reduction_info_table_type *reduction_list,
-		   unsigned n_threads, struct tree_niter_desc *niter)
+		   unsigned n_threads, struct tree_niter_desc *niter,
+		   basic_block region_entry, bool oacc_kernels_p)
 {
   tree many_iterations_cond, type, nit;
   tree arg_struct, new_arg_struct;
   gimple_seq stmts;
+  basic_block parallel_head;
   edge entry, exit;
   struct clsn_data clsn_data;
   unsigned prob;
@@ -1829,40 +1871,43 @@ gen_parallel_loop (struct loop *loop,
   if (stmts)
     gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
 
-  if (loop->inner)
-    m_p_thread=2;
-  else
-    m_p_thread=MIN_PER_THREAD;
-
-   many_iterations_cond =
-     fold_build2 (GE_EXPR, boolean_type_node,
-                nit, build_int_cst (type, m_p_thread * n_threads));
-
-  many_iterations_cond
-    = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-		   invert_truthvalue (unshare_expr (niter->may_be_zero)),
-		   many_iterations_cond);
-  many_iterations_cond
-    = force_gimple_operand (many_iterations_cond, &stmts, false, NULL_TREE);
-  if (stmts)
-    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
-  if (!is_gimple_condexpr (many_iterations_cond))
+  if (!oacc_kernels_p)
     {
+      if (loop->inner)
+	m_p_thread=2;
+      else
+	m_p_thread=MIN_PER_THREAD;
+
+      many_iterations_cond =
+	fold_build2 (GE_EXPR, boolean_type_node,
+		     nit, build_int_cst (type, m_p_thread * n_threads));
+
+      many_iterations_cond
+	= fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+		       invert_truthvalue (unshare_expr (niter->may_be_zero)),
+		       many_iterations_cond);
       many_iterations_cond
-	= force_gimple_operand (many_iterations_cond, &stmts,
-				true, NULL_TREE);
+	= force_gimple_operand (many_iterations_cond, &stmts, false, NULL_TREE);
       if (stmts)
 	gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
-    }
+      if (!is_gimple_condexpr (many_iterations_cond))
+	{
+	  many_iterations_cond
+	    = force_gimple_operand (many_iterations_cond, &stmts,
+				    true, NULL_TREE);
+	  if (stmts)
+	    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
+	}
 
-  initialize_original_copy_tables ();
+      initialize_original_copy_tables ();
 
-  /* We assume that the loop usually iterates a lot.  */
-  prob = 4 * REG_BR_PROB_BASE / 5;
-  loop_version (loop, many_iterations_cond, NULL,
-		prob, prob, REG_BR_PROB_BASE - prob, true);
-  update_ssa (TODO_update_ssa);
-  free_original_copy_tables ();
+      /* We assume that the loop usually iterates a lot.  */
+      prob = 4 * REG_BR_PROB_BASE / 5;
+      loop_version (loop, many_iterations_cond, NULL,
+		    prob, prob, REG_BR_PROB_BASE - prob, true);
+      update_ssa (TODO_update_ssa);
+      free_original_copy_tables ();
+    }
 
   /* Base all the induction variables in LOOP on a single control one.  */
   canonicalize_loop_ivs (loop, &nit, true);
@@ -1879,19 +1924,31 @@ gen_parallel_loop (struct loop *loop,
   entry = loop_preheader_edge (loop);
   exit = single_dom_exit (loop);
 
-  eliminate_local_variables (entry, exit);
-  /* In the old loop, move all variables non-local to the loop to a structure
-     and back, and create separate decls for the variables used in loop.  */
-  separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
-			    &new_arg_struct, &clsn_data);
+  /* This rewrites the body in terms of new variables.  This has already
+     been done for oacc_kernels_p in pass_lower_omp/lower_omp ().  */
+  if (!oacc_kernels_p)
+    {
+      eliminate_local_variables (entry, exit);
+      /* In the old loop, move all variables non-local to the loop to a
+	 structure and back, and create separate decls for the variables used in
+	 loop.  */
+      separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
+				&new_arg_struct, &clsn_data);
+    }
+  else
+    {
+      arg_struct = NULL_TREE;
+      new_arg_struct = NULL_TREE;
+    }
 
   /* Create the parallel constructs.  */
   loc = UNKNOWN_LOCATION;
   cond_stmt = last_stmt (loop->header);
   if (cond_stmt)
     loc = gimple_location (cond_stmt);
-  create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
-			new_arg_struct, n_threads, loc);
+  parallel_head = create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
+					new_arg_struct, n_threads, loc,
+					region_entry, oacc_kernels_p);
   if (reduction_list->elements () > 0)
     create_call_for_reduction (loop, reduction_list, &clsn_data);
 
@@ -1905,6 +1962,16 @@ gen_parallel_loop (struct loop *loop,
      removed statements.  */
   FOR_EACH_LOOP (loop, 0)
     free_numbers_of_iterations_estimates_loop (loop);
+
+  if (oacc_kernels_p)
+    {
+      /* Expand the parallel constructs.  We do it directly here instead of
+	 running a separate expand_omp pass, since it is more efficient, and
+	 less likely to cause troubles with further analyses not being able to
+	 deal with the OMP trees.  */
+
+      omp_expand_local (parallel_head);
+    }
 }
 
 /* Returns true when LOOP contains vector phi nodes.  */
@@ -2131,7 +2198,7 @@ try_create_reduction_list (loop_p loop,
    otherwise.  */
 
 bool
-parallelize_loops (void)
+parallelize_loops (bool oacc_kernels_p)
 {
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
@@ -2140,6 +2207,7 @@ parallelize_loops (void)
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
   source_location loop_loc;
+  basic_block region_entry, region_exit;
 
   /* Do not parallelize loops in the functions created by parallelization.  */
   if (parallelized_function_p (cfun->decl))
@@ -2151,9 +2219,25 @@ parallelize_loops (void)
   reduction_info_table_type reduction_list (10);
   init_stmt_vec_info_vec ();
 
+  calculate_dominance_info (CDI_DOMINATORS);
+
   FOR_EACH_LOOP (loop, 0)
     {
       reduction_list.empty ();
+
+      if (oacc_kernels_p)
+	{
+	  if (!loop_in_oacc_kernels_region_p (loop, &region_entry, &region_exit))
+	    continue;
+	  else
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file,
+			 "Trying loop %d with header bb %d in oacc kernels region\n",
+			 loop->num, loop->header->index);
+	    }
+	}
+
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
         fprintf (dump_file, "Trying loop %d as candidate\n",loop->num);
@@ -2223,8 +2307,9 @@ parallelize_loops (void)
 	  fprintf (dump_file, "\nloop at %s:%d: ",
 		   LOCATION_FILE (loop_loc), LOCATION_LINE (loop_loc));
       }
+
       gen_parallel_loop (loop, &reduction_list,
-			 n_threads, &niter_desc);
+			 n_threads, &niter_desc, region_entry, oacc_kernels_p);
     }
 
   free_stmt_vec_info_vec ();
@@ -2275,7 +2360,7 @@ pass_parallelize_loops::execute (function *fun)
   if (number_of_loops (fun) <= 1)
     return 0;
 
-  if (parallelize_loops ())
+  if (parallelize_loops (false))
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
       return TODO_update_ssa;
@@ -2293,4 +2378,51 @@ make_pass_parallelize_loops (gcc::context *ctxt)
 }
 
 
+namespace {
+
+const pass_data pass_data_parallelize_loops_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "parloops_oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_PARALLELIZE_LOOPS, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_parallelize_loops_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_parallelize_loops_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_parallelize_loops_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_parallelize_loops_oacc_kernels
+
+unsigned
+pass_parallelize_loops_oacc_kernels::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  if (parallelize_loops (true))
+    return TODO_cleanup_cfg | TODO_rebuild_alias;
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_parallelize_loops_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_parallelize_loops_oacc_kernels (ctxt);
+}
+
 #include "gt-tree-parloops.h"
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index dd1f308..a5c7713 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -374,6 +374,8 @@ extern gimple_opt_pass *make_pass_slp_vectorize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_complete_unroll (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_complete_unrolli (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_parallelize_loops (gcc::context *ctxt);
+extern gimple_opt_pass *
+  make_pass_parallelize_loops_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
diff --git a/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c b/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c
new file mode 100644
index 0000000..5cdae0b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2 -std=c99" } */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+#pragma acc kernels copyout (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc kernels copyout (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  {
+    unsigned int sum = 0;
+
+    for (COUNTERTYPE i = 0; i < N; i++)
+      sum += c[i];
+
+    printf ("sum: %u\n", sum);
+
+    if (sum != N_REF)
+      abort ();
+  }
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c b/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c
new file mode 100644
index 0000000..b9e62a0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c
@@ -0,0 +1,59 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2 -std=c99" } */
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#define N (1024 * 512)
+#define N_REF 4293394432
+
+#if 1
+#define COUNTERTYPE unsigned int
+#else
+#define COUNTERTYPE int
+#endif
+
+int
+main (void)
+{
+  unsigned int i;
+
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = malloc (N * sizeof (unsigned int));
+  b = malloc (N * sizeof (unsigned int));
+  c = malloc (N * sizeof (unsigned int));
+
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  {
+    unsigned int sum = 0;
+
+    for (COUNTERTYPE i = 0; i < N; i++)
+      sum += c[i];
+
+    printf ("sum: %u\n", sum);
+
+    if (sum != N_REF)
+      abort ();
+  }
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Add BUILT_IN_GOACC_KERNELS_INTERNAL (was: openacc kernels directive -- initial support)
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (8 preceding siblings ...)
  2014-11-19 20:34 ` openacc kernels directive -- initial support Tom de Vries
@ 2015-04-21 19:27 ` Thomas Schwinge
  2015-04-21 20:24 ` Handle global loop counters in fortran oacc kernels " Thomas Schwinge
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 19:27 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3839 bytes --]

Hi!

On Sat, 15 Nov 2014 13:14:52 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> I'm submitting a patch series with initial support for the oacc kernels directive.
> 
> The patch series uses pass_parallelize_loops to implement parallelization of 
> loops in the oacc kernels region.

Committed to gomp-4_0-branch in r222278:

commit fd3add90d38d5f1b38c9cb557404542b6383b2b0
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:24:57 2015 +0000

    Add BUILT_IN_GOACC_KERNELS_INTERNAL
    
    ..., a variant of the GOACC_kernels builtin.  This variant does not call the
    function passed as function pointer, and therefore is less of an optimization
    barrier than the original variant.
    
    The purpose of this variant is to allow the introduction of the GOACC_kernels
    call before splitting off the region body into a function (something that is
    currently done simultaneously).
    
    	gcc/
    	* builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
    	(ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add
    	DEF_ATTR_TREE_LIST.
    	* omp-builtins.def (BUILT_IN_GOACC_KERNELS_INTERNAL): Add
    	DEF_GOACC_BUILTIN_FNSPEC.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222278 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp    |    6 ++++++
 gcc/builtin-attrs.def |    4 ++++
 gcc/omp-builtins.def  |    5 +++++
 3 files changed, 15 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index b091dd5..7885189 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,11 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
+	(ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add
+	DEF_ATTR_TREE_LIST.
+	* omp-builtins.def (BUILT_IN_GOACC_KERNELS_INTERNAL): Add
+	DEF_GOACC_BUILTIN_FNSPEC.
+
 	* builtins.def (DEF_GOACC_BUILTIN_FNSPEC): Define.
 
 2015-03-21  Tom de Vries  <tom@codesourcery.com>
diff --git gcc/builtin-attrs.def gcc/builtin-attrs.def
index 1338644..8eca053 100644
--- gcc/builtin-attrs.def
+++ gcc/builtin-attrs.def
@@ -64,6 +64,7 @@ DEF_ATTR_FOR_INT (6)
   DEF_ATTR_TREE_LIST (ATTR_LIST_##ENUM, ATTR_NULL,	\
 		      ATTR_##ENUM, ATTR_NULL)
 DEF_ATTR_FOR_STRING (STR1, "1")
+DEF_ATTR_FOR_STRING (DOT_DOT_DOT_r_r_r, "...rrr")
 #undef DEF_ATTR_FOR_STRING
 
 /* Construct a tree for a list of two integers.  */
@@ -127,6 +128,9 @@ DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LIST, ATTR_PURE,		\
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LEAF_LIST, ATTR_PURE,	\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST, \
+		    ATTR_FNSPEC, ATTR_LIST_DOT_DOT_DOT_r_r_r, \
+		    ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LIST, ATTR_NORETURN,	\
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
diff --git gcc/omp-builtins.def gcc/omp-builtins.def
index 03955c4..cd273f2 100644
--- gcc/omp-builtins.def
+++ gcc/omp-builtins.def
@@ -39,6 +39,11 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_END, "GOACC_data_end",
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
+DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_KERNELS_INTERNAL,
+			  "GOACC_kernels_internal",
+			  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+			  ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST,
+			  ATTR_NOTHROW_LIST, "...rrr")
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_KERNELS, "GOACC_kernels",
 		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias)
  2014-11-25 11:30     ` Tom de Vries
@ 2015-04-21 19:40       ` Thomas Schwinge
  2015-04-22  7:36         ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 19:40 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 24444 bytes --]

Hi!

On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 24-11-14 11:56, Tom de Vries wrote:
> > On 15-11-14 18:19, Tom de Vries wrote:
> >> On 15-11-14 13:14, Tom de Vries wrote:
> >>> I'm submitting a patch series with initial support for the oacc kernels
> >>> directive.
> >>>
> >>> The patch series uses pass_parallelize_loops to implement parallelization of
> >>> loops in the oacc kernels region.
> >>>
> >>> The patch series consists of these 8 patches:
> >>> ...
> >>>      1  Expand oacc kernels after pass_build_ealias
> >>>      2  Add pass_oacc_kernels
> >>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>>      5  Add pass_loop_im to pass_oacc_kernels
> >>>      6  Add pass_ccp to pass_oacc_kernels
> >>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>>      8  Do simple omp lowering for no address taken var
> >>> ...
> >>
> >> This patch moves omp expansion of the oacc kernels directive to after
> >> pass_build_ealias.
> >>
> >> The rationale is that in order to use pass_parallelize_loops for analysis and
> >> transformation of an oacc kernels region, we postpone omp expansion of that
> >> region until the earliest point in the pass list where enough information is
> >> availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.
> >>
> >> The patch postpones expansion in expand_omp, and ensures expansion by adding
> >> pass_expand_omp_ssa:
> >> - after pass_build_ealias, and
> >> - after pass_all_early_optimizations for the case we're not optimizing.
> >>
> >> In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
> >> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
> >> lowered omp code, to handle it conservatively.
> >>
> >> The patch contains changes in expand_omp_target to deal with ssa-code, similar
> >> to what is already present in expand_omp_taskreg.
> >>
> >> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
> >> static for oacc kernels. It does this to get some references to .omp_data_sizes
> >> and .omp_data_kinds in the ssa code.  Without these references, the definitions
> >> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
> >> enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
> >> kludge for this purpose ].
> >>
> >> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
> >> original function of which the definition has been removed (as in moved to the
> >> split off function). TODO_remove_unused_locals takes care of some of them, but
> >> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
> >> dangling SSA_NAMEs and releases them.
> >>
> >
> > Reposting with small update: I've replaced the use of the rather generic
> > gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.
> >
> > Bootstrapped and reg-tested in the same way as before.
> >
> 
> I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
> 
> This allows fre to unify references to the same omp variable before entering 
> pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
> 
> F.i. this reduction fragment:
> ...
>    # VUSE <.MEM_8>
>    # PT = { D.2282 }
>    _67 = .omp_data_i_59->sumD.2270;
>    # VUSE <.MEM_8>
>    _68 = *_67;
> 
>    _70 = _66 + _68;
> 
>    # VUSE <.MEM_8>
>    # PT = { D.2282 }
>    _69 = .omp_data_i_59->sumD.2270;
>    # .MEM_71 = VDEF <.MEM_8>
>    *_69 = _70;
> ...
> 
> is transformed by fre into:
> ...
>    # VUSE <.MEM_8>
>    # PT = { D.2282 }
>    _67 = .omp_data_i_59->sumD.2270;
>    # VUSE <.MEM_8>
>    _68 = *_67;
> 
>    _70 = _66 + _68;
> 
>    # .MEM_71 = VDEF <.MEM_8>
>    *_67 = _70;
> ...
> 
> In order for pass_fre to respect the kernels region boundaries, I've added a 
> change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively.
> 
> Bootstrapped and reg-tested as before.
> 
> OK for trunk?

Committed to gomp-4_0-branch in r222279:

commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:37:19 2015 +0000

    Expand oacc kernels after pass_fre
    
    	gcc/
    	* omp-low.c: Include gimple-pretty-print.h.
    	(release_first_vuse_in_edge_dest): New function.
    	(expand_omp_target): When not in ssa, don't split off oacc kernels
    	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
    	expanssion, and add GOACC_kernels_internal call.
    	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
    	into GOACC_kernels call.  Handle ssa-code.
    	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
    	properties_provided field.
    	(pass_expand_omp::execute): Set PROP_gimple_eomp in
    	cfun->curr_properties tentatively.
    	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
    	todo_flags_finish field.
    	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
    	execute_expand_omp.
    	(gimple_stmt_ssa_operand_references_var_p)
    	(gimple_stmt_omp_data_i_init_p): New function.
    	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
    	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
    	pass_expand_omp_ssa after pass_all_early_optimizations.
    	* tree-ssa-ccp.c: Include omp-low.h.
    	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
    	conservatively.
    	* tree-ssa-forwprop.c: Include omp-low.h.
    	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
    	* tree-ssa-sccvn.c: Include omp-low.h.
    	(visit_use): Handle .omp_data_i init conservatively.
    	* cgraph.c (cgraph_node::release_body): Don't release offloadable
    	functions.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp      |   30 +++++++
 gcc/cgraph.c            |    9 ++
 gcc/omp-low.c           |  214 ++++++++++++++++++++++++++++++++++++++++++++---
 gcc/omp-low.h           |    1 +
 gcc/passes.def          |    2 +
 gcc/tree-ssa-ccp.c      |    6 ++
 gcc/tree-ssa-forwprop.c |    4 +-
 gcc/tree-ssa-sccvn.c    |    4 +-
 8 files changed, 257 insertions(+), 13 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 7885189..1f86160 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,35 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* omp-low.c: Include gimple-pretty-print.h.
+	(release_first_vuse_in_edge_dest): New function.
+	(expand_omp_target): When not in ssa, don't split off oacc kernels
+	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
+	expanssion, and add GOACC_kernels_internal call.
+	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
+	into GOACC_kernels call.  Handle ssa-code.
+	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
+	properties_provided field.
+	(pass_expand_omp::execute): Set PROP_gimple_eomp in
+	cfun->curr_properties tentatively.
+	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
+	todo_flags_finish field.
+	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
+	execute_expand_omp.
+	(gimple_stmt_ssa_operand_references_var_p)
+	(gimple_stmt_omp_data_i_init_p): New function.
+	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
+	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
+	pass_expand_omp_ssa after pass_all_early_optimizations.
+	* tree-ssa-ccp.c: Include omp-low.h.
+	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
+	conservatively.
+	* tree-ssa-forwprop.c: Include omp-low.h.
+	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
+	* tree-ssa-sccvn.c: Include omp-low.h.
+	(visit_use): Handle .omp_data_i init conservatively.
+	* cgraph.c (cgraph_node::release_body): Don't release offloadable
+	functions.
+
 	* builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
 	(ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add
 	DEF_ATTR_TREE_LIST.
diff --git gcc/cgraph.c gcc/cgraph.c
index e099856..c608d7e 100644
--- gcc/cgraph.c
+++ gcc/cgraph.c
@@ -1706,6 +1706,15 @@ release_function_body (tree decl)
 void
 cgraph_node::release_body (bool keep_arguments)
 {
+  /* The omp-expansion of the oacc kernels directive is post-poned till after
+     all_small_ipa_passes.  That means pass_ipa_free_lang_data, which tries to
+     release the body of the offload function, is run before omp_expand_target 
+     can process the oacc kernels directive,  and omp_expand_target would crash
+     trying to access the body.  This snippet works around this problem.
+     FIXME: This should probably be fixed in a different way.  */
+  if (offloadable)
+    return;
+
   ipa_transforms_to_apply.release ();
   if (!used_as_abstract_origin && symtab->state != PARSING)
     {
diff --git gcc/omp-low.c gcc/omp-low.c
index 4134f3d..16d9a5e 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -108,6 +108,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "lto-section-names.h"
 #include "gomp-constants.h"
+#include "gimple-pretty-print.h"
 
 
 /* Lowering of OMP parallel and workshare constructs proceeds in two
@@ -5353,6 +5354,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
     }
 }
 
+static void
+release_first_vuse_in_edge_dest (edge e)
+{
+  gimple_stmt_iterator i;
+  basic_block bb = e->dest;
+
+  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
+    {
+      gimple phi = gsi_stmt (i);
+      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+
+      if (!virtual_operand_p (arg))
+	continue;
+
+      mark_virtual_operand_for_renaming (arg);
+      return;
+    }
+
+  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
+    {
+      gimple stmt = gsi_stmt (i);
+      if (gimple_vuse (stmt) == NULL_TREE)
+	continue;
+
+      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
+      return;
+    }
+}
+
 /* Expand the OpenMP parallel or task directive starting at REGION.  */
 
 static void
@@ -8770,8 +8800,11 @@ expand_omp_target (struct omp_region *region)
   gimple stmt;
   edge e;
   bool offloaded, data_region;
+  bool do_emit_library_call = true;
+  bool do_splitoff = true;
 
   entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
+
   new_bb = region->entry;
 
   offloaded = is_gimple_omp_offloaded (entry_stmt);
@@ -8804,12 +8837,48 @@ expand_omp_target (struct omp_region *region)
   /* Supported by expand_omp_taskreg, but not here.  */
   if (child_cfun != NULL)
     gcc_checking_assert (!child_cfun->cfg);
-  gcc_checking_assert (!gimple_in_ssa_p (cfun));
 
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (offloaded)
+  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      if (!gimple_in_ssa_p (cfun))
+	{
+	  /* We need to do analysis and optimizations on the kernels region
+	     before splitoff.  Since that's hard to do on low gimple, we
+	     postpone the splitoff until we're in SSA.
+	     However, we do the emit of the corresponding function call already,
+	     in order to keep the arguments of the call alive until the
+	     splitoff.
+	     Since at this point the function that is called is empty, we can
+	     model the function as BUILT_IN_GOACC_KERNELS_INTERNAL, which marks
+	     some of it's function arguments as non-escaping, so it acts less
+	     as an optimization barrier.  */
+	  do_splitoff = false;
+	  cfun->curr_properties &= ~PROP_gimple_eomp;
+	}
+      else
+	{
+	  /* Don't emit the library call.  We've already done that.  */
+	  do_emit_library_call = false;
+	  /* Transform BUILT_IN_GOACC_KERNELS_INTERNAL into
+	     BUILT_IN_GOACC_KERNELS_INTERNAL.  Now that the function body will be
+	     split off, we can no longer regard the omp_data_array reference as
+	     non-escaping.  */
+	  gsi = gsi_last_bb (entry_bb);
+	  gsi_prev (&gsi);
+	  gcall *call = as_a <gcall *> (gsi_stmt (gsi));
+	  gcc_assert (gimple_call_builtin_p (call, BUILT_IN_GOACC_KERNELS_INTERNAL));
+	  tree fndecl = builtin_decl_explicit (BUILT_IN_GOACC_KERNELS);
+	  gimple_call_set_fndecl (call, fndecl);
+	  gimple_call_set_fntype (call, TREE_TYPE (fndecl));
+	  gimple_call_reset_alias_info (call);
+	}
+    }
+
+  if (offloaded
+      && do_splitoff)
     {
       unsigned srcidx, dstidx, num;
 
@@ -8831,7 +8900,7 @@ expand_omp_target (struct omp_region *region)
 	{
 	  basic_block entry_succ_bb = single_succ (entry_bb);
 	  gimple_stmt_iterator gsi;
-	  tree arg;
+	  tree arg, narg;
 	  gimple tgtcopy_stmt = NULL;
 	  tree sender = TREE_VEC_ELT (data_arg, 0);
 
@@ -8861,8 +8930,27 @@ expand_omp_target (struct omp_region *region)
 	  gcc_assert (tgtcopy_stmt != NULL);
 	  arg = DECL_ARGUMENTS (child_fn);
 
-	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
-	  gsi_remove (&gsi, true);
+	  if (!gimple_in_ssa_p (cfun))
+	    {
+	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
+	      gsi_remove (&gsi, true);
+	    }
+	  else
+	    {
+	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
+			  == arg);
+
+	      /* If we are in ssa form, we must load the value from the default
+		 definition of the argument.  That should not be defined now,
+		 since the argument is not used uninitialized.  */
+	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
+	      narg = make_ssa_name (arg, gimple_build_nop ());
+	      set_ssa_default_def (cfun, arg, narg);
+	      /* ?? Is setting the subcode really necessary ??  */
+	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
+	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
+	      update_stmt (tgtcopy_stmt);
+	    }
 	}
 
       /* Declare local variables needed in CHILD_CFUN.  */
@@ -8905,11 +8993,23 @@ expand_omp_target (struct omp_region *region)
 	  stmt = gimple_build_return (NULL);
 	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
 	  gsi_remove (&gsi, true);
+
+	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
+	     which is about to be split off.  Mark the vdef for renaming.  */
+	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
 	}
 
       /* Move the offloading region into CHILD_CFUN.  */
 
-      block = gimple_block (entry_stmt);
+      if (gimple_in_ssa_p (cfun))
+	{
+	  init_tree_ssa (child_cfun);
+	  init_ssa_operands (child_cfun);
+	  child_cfun->gimple_df->in_ssa_p = true;
+	  block = NULL_TREE;
+	}
+      else
+	block = gimple_block (entry_stmt);
 
       new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
       if (exit_bb)
@@ -8969,9 +9069,18 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+      if (gimple_in_ssa_p (cfun))
+	update_ssa (TODO_update_ssa);
       pop_cfun ();
     }
 
+  if (!do_emit_library_call)
+    {
+      if (gimple_in_ssa_p (cfun))
+	update_ssa (TODO_update_ssa_only_virtuals);
+      return;
+    }
+
   /* Emit a library call to launch the offloading region, or do data
      transfers.  */
   tree t1, t2, t3, t4, device, cond, c, clauses;
@@ -8993,7 +9102,7 @@ expand_omp_target (struct omp_region *region)
       start_ix = BUILT_IN_GOACC_PARALLEL;
       break;
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
-      start_ix = BUILT_IN_GOACC_KERNELS;
+      start_ix = BUILT_IN_GOACC_KERNELS_INTERNAL;
       break;
     case GF_OMP_TARGET_KIND_OACC_DATA:
       start_ix = BUILT_IN_GOACC_DATA_START;
@@ -9128,6 +9237,7 @@ expand_omp_target (struct omp_region *region)
     case BUILT_IN_GOACC_DATA_START:
     case BUILT_IN_GOACC_ENTER_EXIT_DATA:
     case BUILT_IN_GOACC_KERNELS:
+    case BUILT_IN_GOACC_KERNELS_INTERNAL:
     case BUILT_IN_GOACC_PARALLEL:
     case BUILT_IN_GOACC_UPDATE:
       break;
@@ -9146,6 +9256,7 @@ expand_omp_target (struct omp_region *region)
     case BUILT_IN_GOMP_TARGET_UPDATE:
       break;
     case BUILT_IN_GOACC_KERNELS:
+    case BUILT_IN_GOACC_KERNELS_INTERNAL:
     case BUILT_IN_GOACC_PARALLEL:
       {
 	tree t_num_gangs, t_num_workers, t_vector_length;
@@ -9249,6 +9360,8 @@ expand_omp_target (struct omp_region *region)
       gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
       gsi_remove (&gsi, true);
     }
+  if (gimple_in_ssa_p (cfun))
+    update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 
@@ -9503,7 +9616,7 @@ const pass_data pass_data_expand_omp =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
-  PROP_gimple_eomp, /* properties_provided */
+  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
   0, /* todo_flags_finish */
@@ -9517,12 +9630,14 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual unsigned int execute (function *)
+  virtual unsigned int execute (function *fun)
     {
       bool gate = ((flag_cilkplus != 0 || flag_openacc != 0 || flag_openmp != 0
 		    || flag_openmp_simd != 0)
 		   && !seen_error ());
 
+      fun->curr_properties |= PROP_gimple_eomp;
+
       /* This pass always runs, to provide PROP_gimple_eomp.
 	 But often, there is nothing to do.  */
       if (!gate)
@@ -9553,7 +9668,8 @@ const pass_data pass_data_expand_omp_ssa =
   PROP_gimple_eomp, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_rebuild_alias
+  | TODO_remove_unused_locals, /* todo_flags_finish */
 };
 
 class pass_expand_omp_ssa : public gimple_opt_pass
@@ -9568,7 +9684,48 @@ public:
     {
       return !(fun->curr_properties & PROP_gimple_eomp);
     }
-  virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  virtual unsigned int execute (function *)
+    {
+      unsigned res = execute_expand_omp ();
+
+      /* After running pass_expand_omp_ssa to expand the oacc kernels
+	 directive, we are left in the original function with anonymous
+	 SSA_NAMEs, with a defining statement that has been deleted.  This
+	 pass finds those SSA_NAMEs and releases them.
+	 TODO: Either fix this elsewhere, or make the fix unnecessary.  */
+      unsigned int i;
+      for (i = 1; i < num_ssa_names; ++i)
+	{
+	  tree name = ssa_name (i);
+	  if (name == NULL_TREE)
+	    continue;
+
+	  gimple stmt = SSA_NAME_DEF_STMT (name);
+	  bool found = false;
+
+	  ssa_op_iter op_iter;
+	  def_operand_p def_p;
+	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	    {
+	      tree def = DEF_FROM_PTR (def_p);
+	      if (def == name)
+		{
+		  found = true;
+		  break;
+		}
+	    }
+
+	  if (!found)
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Released dangling ssa name %u\n", i);
+	      release_ssa_name (name);
+	    }
+	}
+
+      return res;
+    }
+  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
@@ -13728,4 +13885,39 @@ omp_finish_file (void)
     }
 }
 
+static bool
+gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
+					  unsigned int nr_varnames,
+					  unsigned int flags)
+{
+  tree use;
+  ssa_op_iter iter;
+  const char *s;
+
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
+    {
+      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
+	continue;
+      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
+
+      unsigned int i;
+      for (i = 0; i < nr_varnames; ++i)
+	if (strcmp (varnames[i], s) == 0)
+	  return true;
+    }
+
+  return false;
+}
+
+/* Return true if STMT is .omp_data_i init.  */
+
+bool
+gimple_stmt_omp_data_i_init_p (gimple stmt)
+{
+  const char *varnames[] = { ".omp_data_i" };
+  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
+  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
+						   SSA_OP_DEF);
+}
+
 #include "gt-omp-low.h"
diff --git gcc/omp-low.h gcc/omp-low.h
index 8a4052e..3d30c3b 100644
--- gcc/omp-low.h
+++ gcc/omp-low.h
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern bool gimple_stmt_omp_data_i_init_p (gimple);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git gcc/passes.def gcc/passes.def
index 2bc5dcd..db0dd18 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
 	  NEXT_PASS (pass_fre);
+	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
 	  NEXT_PASS (pass_early_ipa_sra);
@@ -99,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
 	      late.  */
 	  NEXT_PASS (pass_split_functions);
       POP_INSERT_PASSES ()
+      NEXT_PASS (pass_expand_omp_ssa);
       NEXT_PASS (pass_release_ssa_names);
       NEXT_PASS (pass_rebuild_cgraph_edges);
       NEXT_PASS (pass_inline_parameters);
diff --git gcc/tree-ssa-ccp.c gcc/tree-ssa-ccp.c
index d45a3ff..46fe1c7 100644
--- gcc/tree-ssa-ccp.c
+++ gcc/tree-ssa-ccp.c
@@ -172,6 +172,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "wide-int-print.h"
 #include "builtins.h"
 #include "tree-chkp.h"
+#include "omp-low.h"
 
 
 /* Possible lattice values.  */
@@ -796,6 +797,9 @@ surely_varying_stmt_p (gimple stmt)
       && gimple_code (stmt) != GIMPLE_CALL)
     return true;
 
+  if (gimple_stmt_omp_data_i_init_p (stmt))
+    return true;
+
   return false;
 }
 
@@ -2329,6 +2333,8 @@ ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
   switch (gimple_code (stmt))
     {
       case GIMPLE_ASSIGN:
+	if (gimple_stmt_omp_data_i_init_p (stmt))
+	  break;
         /* If the statement is an assignment that produces a single
            output value, evaluate its RHS to see if the lattice value of
            its output has changed.  */
diff --git gcc/tree-ssa-forwprop.c gcc/tree-ssa-forwprop.c
index d8db20a..554a5a5 100644
--- gcc/tree-ssa-forwprop.c
+++ gcc/tree-ssa-forwprop.c
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-into-ssa.h"
 #include "cfganal.h"
+#include "omp-low.h"
 
 /* This pass propagates the RHS of assignment statements into use
    sites of the LHS of the assignment.  It's basically a specialized
@@ -2155,7 +2156,8 @@ pass_forwprop::execute (function *fun)
 	  tree lhs, rhs;
 	  enum tree_code code;
 
-	  if (!is_gimple_assign (stmt))
+	  if (!is_gimple_assign (stmt)
+	      || gimple_stmt_omp_data_i_init_p (stmt))
 	    {
 	      gsi_next (&gsi);
 	      continue;
diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
index e417a15..449a615 100644
--- gcc/tree-ssa-sccvn.c
+++ gcc/tree-ssa-sccvn.c
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-ref.h"
 #include "plugin-api.h"
 #include "cgraph.h"
+#include "omp-low.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
    Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
@@ -3542,7 +3543,8 @@ visit_use (tree use)
     {
       if (gimple_code (stmt) == GIMPLE_PHI)
 	changed = visit_phi (stmt);
-      else if (gimple_has_volatile_ops (stmt))
+      else if (gimple_has_volatile_ops (stmt)
+	       || gimple_stmt_omp_data_i_init_p (stmt))
 	changed = defs_to_varying (stmt);
       else if (is_gimple_assign (stmt))
 	{


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 2/8] Add pass_oacc_kernels
  2014-11-25 11:31   ` Tom de Vries
@ 2015-04-21 19:46     ` Thomas Schwinge
  0 siblings, 0 replies; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 19:46 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 5446 bytes --]

Hi!

On Tue, 25 Nov 2014 12:25:35 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 15-11-14 18:20, Tom de Vries wrote:
> > On 15-11-14 13:14, Tom de Vries wrote:
> >> I'm submitting a patch series with initial support for the oacc kernels
> >> directive.
> >>
> >> The patch series uses pass_parallelize_loops to implement parallelization of
> >> loops in the oacc kernels region.
> >>
> >> The patch series consists of these 8 patches:
> >> ...
> >>      1  Expand oacc kernels after pass_build_ealias
> >>      2  Add pass_oacc_kernels
> >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>      5  Add pass_loop_im to pass_oacc_kernels
> >>      6  Add pass_ccp to pass_oacc_kernels
> >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>      8  Do simple omp lowering for no address taken var
> >> ...
> >
> > This patch adds a pass group pass_oacc_kernels.
> >
> > The rationale is that we want a pass group to run oacc kernels region related
> > (optimization) passes in.
> >
> 
> Updated for moving pass_oacc_kernels down past pass_fre in the pass list.
> 
> Bootstrapped and reg-tested as before.
> 
> OK for trunk?

Committed to gomp-4_0-branch in r222280:

commit 0ac5f6ae679a0cd70b197f0962d7d365e7dfbd21
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:45:23 2015 +0000

    Add pass_oacc_kernels
    
    	gcc/
    	* passes.def: Add pass group pass_oacc_kernels.
    	* tree-pass.h (make_pass_oacc_kernels): Declare.
    	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
    	(pass_data_oacc_kernels): New pass_data.
    	(class pass_oacc_kernels): New pass.
    	(make_pass_oacc_kernels): New function.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222280 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp  |    7 +++++++
 gcc/passes.def      |    7 ++++++-
 gcc/tree-pass.h     |    1 +
 gcc/tree-ssa-loop.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 59 insertions(+), 1 deletion(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 1f86160..8a53ad8 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,12 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* passes.def: Add pass group pass_oacc_kernels.
+	* tree-pass.h (make_pass_oacc_kernels): Declare.
+	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
+	(pass_data_oacc_kernels): New pass_data.
+	(class pass_oacc_kernels): New pass.
+	(make_pass_oacc_kernels): New function.
+
 	* omp-low.c: Include gimple-pretty-print.h.
 	(release_first_vuse_in_edge_dest): New function.
 	(expand_omp_target): When not in ssa, don't split off oacc kernels
diff --git gcc/passes.def gcc/passes.def
index db0dd18..854c5b8 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -86,7 +86,12 @@ along with GCC; see the file COPYING3.  If not see
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
 	  NEXT_PASS (pass_fre);
-	  NEXT_PASS (pass_expand_omp_ssa);
+	  /* Pass group that runs when there are oacc kernels in the
+	     function.  */
+	  NEXT_PASS (pass_oacc_kernels);
+	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+	      NEXT_PASS (pass_expand_omp_ssa);
+	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
 	  NEXT_PASS (pass_early_ipa_sra);
diff --git gcc/tree-pass.h gcc/tree-pass.h
index b59ae7a..35778f2 100644
--- gcc/tree-pass.h
+++ gcc/tree-pass.h
@@ -450,6 +450,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
 
 /* IPA Passes */
 extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
index ccb8f97..a041858 100644
--- gcc/tree-ssa-loop.c
+++ gcc/tree-ssa-loop.c
@@ -163,6 +163,51 @@ make_pass_tree_loop (gcc::context *ctxt)
   return new pass_tree_loop (ctxt);
 }
 
+/* Gate for oacc kernels pass group.  */
+
+static bool
+gate_oacc_kernels (function *fn)
+{
+  return (fn->curr_properties & PROP_gimple_eomp) == 0;
+}
+
+/* The oacc kernels superpass.  */
+
+namespace {
+
+const pass_data pass_data_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fn) { return gate_oacc_kernels (fn); }
+
+}; // class pass_oacc_kernels
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_oacc_kernels (ctxt);
+}
+
 /* The no-loop superpass.  */
 
 namespace {


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2014-11-25 11:39   ` Tom de Vries
@ 2015-04-21 19:49     ` Thomas Schwinge
  2015-04-22  7:39       ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 19:49 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 11119 bytes --]

Hi!

On Tue, 25 Nov 2014 12:27:34 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 15-11-14 18:21, Tom de Vries wrote:
> > On 15-11-14 13:14, Tom de Vries wrote:
> >> Hi,
> >>
> >> I'm submitting a patch series with initial support for the oacc kernels
> >> directive.
> >>
> >> The patch series uses pass_parallelize_loops to implement parallelization of
> >> loops in the oacc kernels region.
> >>
> >> The patch series consists of these 8 patches:
> >> ...
> >>      1  Expand oacc kernels after pass_build_ealias
> >>      2  Add pass_oacc_kernels
> >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>      5  Add pass_loop_im to pass_oacc_kernels
> >>      6  Add pass_ccp to pass_oacc_kernels
> >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>      8  Do simple omp lowering for no address taken var
> >> ...
> >
> > This patch adds a pass_ch_oacc_kernels to the pass group pass_oacc_kernels.
> >
> > The idea is that pass_parallelize_loops only deals with loops for which the
> > header has been copied, so the easiest way to meet that requirement when running
> > pass_parallelize_loops in group pass_oacc_kernels, is to run pass_ch as a part
> > of pass_oacc_kernels.
> >
> > We define a seperate pass pass_ch_oacc_kernels, to leave all loops that aren't
> > part of a kernels region alone.
> >
> 
> Updated for moving pass_oacc_kernels down past pass_fre in the pass list.
> 
> Bootstrapped and reg-tested as before.
> 
> OK for trunk?

Committed to gomp-4_0-branch in r222281:

commit 58c33a7965c379b55b549d50e3b79b2252bcc876
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:48:16 2015 +0000

    Add pass_ch_oacc_kernels to pass_oacc_kernels
    
    	gcc/
    	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
    	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
    	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
    	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
    	* tree-ssa-loop-ch.c: Include omp-low.h.
    	(pass_ch_execute): Declare.
    	(pass_ch::execute): Factor out ...
    	(pass_ch_execute): ... this new function.  If handling oacc kernels,
    	skip loops that are not in oacc kernels region.
    	(pass_ch_oacc_kernels::execute):
    	(pass_data_ch_oacc_kernels): New pass_data.
    	(class pass_ch_oacc_kernels): New pass.
    	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
    	function.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222281 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp     |   15 ++++++++
 gcc/omp-low.c          |   91 ++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/omp-low.h          |    2 ++
 gcc/passes.def         |    1 +
 gcc/tree-pass.h        |    1 +
 gcc/tree-ssa-loop-ch.c |   59 +++++++++++++++++++++++++++++--
 6 files changed, 167 insertions(+), 2 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 8a53ad8..d00c5e0 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,20 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
+	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
+	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
+	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
+	* tree-ssa-loop-ch.c: Include omp-low.h.
+	(pass_ch_execute): Declare.
+	(pass_ch::execute): Factor out ...
+	(pass_ch_execute): ... this new function.  If handling oacc kernels,
+	skip loops that are not in oacc kernels region.
+	(pass_ch_oacc_kernels::execute):
+	(pass_data_ch_oacc_kernels): New pass_data.
+	(class pass_ch_oacc_kernels): New pass.
+	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
+	function.
+
 	* passes.def: Add pass group pass_oacc_kernels.
 	* tree-pass.h (make_pass_oacc_kernels): Declare.
 	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
diff --git gcc/omp-low.c gcc/omp-low.c
index 16d9a5e..1b03ae6 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -13920,4 +13920,95 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
 						   SSA_OP_DEF);
 }
 
+/* Return true if LOOP is inside a kernels region.  */
+
+bool
+loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry,
+			       basic_block *region_exit)
+{
+  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap region_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap_clear (region_bitmap);
+
+  if (region_entry != NULL)
+    *region_entry = NULL;
+  if (region_exit != NULL)
+    *region_exit = NULL;
+
+  basic_block bb;
+  gimple last;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      if (bitmap_bit_p (region_bitmap, bb->index))
+	continue;
+
+      last = last_stmt (bb);
+      if (!last)
+	continue;
+
+      if (gimple_code (last) != GIMPLE_OMP_TARGET
+	  || (gimple_omp_target_kind (last) != GF_OMP_TARGET_KIND_OACC_KERNELS))
+	continue;
+
+      bitmap_clear (excludes_bitmap);
+      bitmap_set_bit (excludes_bitmap, bb->index);
+
+      vec<basic_block> dominated
+	= get_all_dominated_blocks (CDI_DOMINATORS, bb);
+
+      unsigned di;
+      basic_block dom;
+
+      basic_block end_region = NULL;
+      FOR_EACH_VEC_ELT (dominated, di, dom)
+	{
+	  if (dom == bb)
+	    continue;
+
+	  last = last_stmt (dom);
+	  if (!last)
+	    continue;
+
+	  if (gimple_code (last) != GIMPLE_OMP_RETURN)
+	    continue;
+
+	  if (end_region == NULL
+	      || dominated_by_p (CDI_DOMINATORS, end_region, dom))
+	    end_region = dom;
+	}
+
+      if (end_region == NULL)
+	{
+	  gimple kernels = last_stmt (bb);
+	  fatal_error (gimple_location (kernels),
+		       "End of kernel region unreachable");
+	}
+
+      vec<basic_block> excludes
+	= get_all_dominated_blocks (CDI_DOMINATORS, end_region);
+
+      unsigned di2;
+      basic_block exclude;
+
+      FOR_EACH_VEC_ELT (excludes, di2, exclude)
+	if (exclude != end_region)
+	  bitmap_set_bit (excludes_bitmap, exclude->index);
+
+      FOR_EACH_VEC_ELT (dominated, di, dom)
+	if (!bitmap_bit_p (excludes_bitmap, dom->index))
+	  bitmap_set_bit (region_bitmap, dom->index);
+
+      if (bitmap_bit_p (region_bitmap, loop->header->index))
+	{
+	  if (region_entry != NULL)
+	    *region_entry = bb;
+	  if (region_exit != NULL)
+	    *region_exit = end_region;
+	  return true;
+	}
+    }
+
+  return false;
+}
+
 #include "gt-omp-low.h"
diff --git gcc/omp-low.h gcc/omp-low.h
index 3d30c3b..ae63c9f 100644
--- gcc/omp-low.h
+++ gcc/omp-low.h
@@ -29,6 +29,8 @@ extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
 extern bool gimple_stmt_omp_data_i_init_p (gimple);
+extern bool loop_in_oacc_kernels_region_p (struct loop *, basic_block *,
+					   basic_block *);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git gcc/passes.def gcc/passes.def
index 854c5b8..5cdbc87 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -90,6 +90,7 @@ along with GCC; see the file COPYING3.  If not see
 	     function.  */
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+	      NEXT_PASS (pass_ch_oacc_kernels);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_merge_phi);
diff --git gcc/tree-pass.h gcc/tree-pass.h
index 35778f2..321229a 100644
--- gcc/tree-pass.h
+++ gcc/tree-pass.h
@@ -379,6 +379,7 @@ extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_ch_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git gcc/tree-ssa-loop-ch.c gcc/tree-ssa-loop-ch.c
index d759de7..5f24bcb 100644
--- gcc/tree-ssa-loop-ch.c
+++ gcc/tree-ssa-loop-ch.c
@@ -54,12 +54,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "flags.h"
 #include "tree-ssa-threadedge.h"
+#include "omp-low.h"
 
 /* Duplicates headers of loops if they are small enough, so that the statements
    in the loop body are always executed when the loop is entered.  This
    increases effectiveness of code motion optimizations, and reduces the need
    for loop preconditioning.  */
 
+static unsigned int pass_ch_execute (function *, bool);
+
 /* Check whether we should duplicate HEADER of LOOP.  At most *LIMIT
    instructions should be duplicated, limit is decreased by the actual
    amount.  */
@@ -178,6 +181,14 @@ public:
 unsigned int
 pass_ch::execute (function *fun)
 {
+  return pass_ch_execute (fun, false);
+}
+
+} // anon namespace
+
+static unsigned int
+pass_ch_execute (function *fun, bool oacc_kernels_p)
+{
   struct loop *loop;
   basic_block header;
   edge exit, entry;
@@ -211,6 +222,10 @@ pass_ch::execute (function *fun)
       if (do_while_loop_p (loop))
 	continue;
 
+      if (oacc_kernels_p
+	  && !loop_in_oacc_kernels_region_p (loop, NULL, NULL))
+	continue;
+
       /* Iterate the header copying up to limit; this takes care of the cases
 	 like while (a && b) {...}, where we want to have both of the conditions
 	 copied.  TODO -- handle while (a || b) - like cases, by not requiring
@@ -301,10 +316,50 @@ pass_ch::execute (function *fun)
   return 0;
 }
 
-} // anon namespace
-
 gimple_opt_pass *
 make_pass_ch (gcc::context *ctxt)
 {
   return new pass_ch (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_ch_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "ch_oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_CH, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+ class pass_ch_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_ch_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_ch_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_ch_oacc_kernels
+
+unsigned int
+pass_ch_oacc_kernels::execute (function *fun)
+{
+  return pass_ch_execute (fun, true);
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_ch_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_ch_oacc_kernels (ctxt);
+}


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
  2014-11-25 11:42   ` Tom de Vries
@ 2015-04-21 19:52     ` Thomas Schwinge
  2015-04-22  7:40       ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 19:52 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3857 bytes --]

Hi!

On Tue, 25 Nov 2014 12:29:28 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 15-11-14 18:21, Tom de Vries wrote:
> > On 15-11-14 13:14, Tom de Vries wrote:
> >> I'm submitting a patch series with initial support for the oacc kernels
> >> directive.
> >>
> >> The patch series uses pass_parallelize_loops to implement parallelization of
> >> loops in the oacc kernels region.
> >>
> >> The patch series consists of these 8 patches:
> >> ...
> >>      1  Expand oacc kernels after pass_build_ealias
> >>      2  Add pass_oacc_kernels
> >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>      5  Add pass_loop_im to pass_oacc_kernels
> >>      6  Add pass_ccp to pass_oacc_kernels
> >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>      8  Do simple omp lowering for no address taken var
> >> ...
> >
> > This patch adds pass_tree_loop_init and pass_tree_loop_init_done to
> > pass_oacc_kernels.
> >
> > Pass_parallelize_loops is run between these passes in the pass group
> > pass_tree_loop, since it requires loop information.  We do the same for
> > pass_oacc_kernels.
> >
> 
> Updated for moving pass_oacc_kernels down past pass_fre in the pass list.
> 
> Bootstrapped and reg-tested as before.
> 
> OK for trunk?

Committed to gomp-4_0-branch in r222282:

commit cb95b4a1efcdb96c58cda986d53b20c3537c1ab7
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:51:33 2015 +0000

    Add pass_tree_loop_{init,done} to pass_oacc_kernels
    
    	gcc/
    	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
    	group pass_oacc_kernels.
    	* tree-ssa-loop.c (pass_tree_loop_init::clone)
    	(pass_tree_loop_done::clone): New function.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222282 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp  |    5 +++++
 gcc/passes.def      |    2 ++
 gcc/tree-ssa-loop.c |    2 ++
 3 files changed, 9 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index d00c5e0..1fb060f 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,10 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
+	group pass_oacc_kernels.
+	* tree-ssa-loop.c (pass_tree_loop_init::clone)
+	(pass_tree_loop_done::clone): New function.
+
 	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
 	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
 	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
diff --git gcc/passes.def gcc/passes.def
index 5cdbc87..83ae04e 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -91,7 +91,9 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
+	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_expand_omp_ssa);
+	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
index a041858..2a96a39 100644
--- gcc/tree-ssa-loop.c
+++ gcc/tree-ssa-loop.c
@@ -272,6 +272,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
 
 }; // class pass_tree_loop_init
 
@@ -566,6 +567,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }
+  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
 
 }; // class pass_tree_loop_done
 


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 5/8] Add pass_lim to pass_oacc_kernels
  2014-11-25 12:00   ` Tom de Vries
@ 2015-04-21 19:57     ` Thomas Schwinge
  0 siblings, 0 replies; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 19:57 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 29141 bytes --]

Hi!

On Tue, 25 Nov 2014 12:30:52 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 15-11-14 18:22, Tom de Vries wrote:
> > On 15-11-14 13:14, Tom de Vries wrote:
> >> I'm submitting a patch series with initial support for the oacc kernels
> >> directive.
> >>
> >> The patch series uses pass_parallelize_loops to implement parallelization of
> >> loops in the oacc kernels region.
> >>
> >> The patch series consists of these 8 patches:
> >> ...
> >>      1  Expand oacc kernels after pass_build_ealias
> >>      2  Add pass_oacc_kernels
> >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>      5  Add pass_loop_im to pass_oacc_kernels
> >>      6  Add pass_ccp to pass_oacc_kernels
> >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>      8  Do simple omp lowering for no address taken var
> >> ...
> >
> > This patch adds pass_loop_im to pass group pass_oacc_kernels.
> >
> > We need this pass to simplify the loop body, and allow pass_parloops to detect
> > that loop iterations are independent.
> >
> 
> Updated for moving pass_oacc_kernels down past pass_fre in the pass list.
> 
> Bootstrapped and reg-tested as before.
> 
> OK for trunk?

Committed to gomp-4_0-branch in r222283:

commit 79112043cabc81c3a283585c9a28b6a1ab3826df
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:55:42 2015 +0000

    Add pass_lim to pass_oacc_kernels
    
    	gcc/
    	* passes.def: Add pass_lim in pass group pass_ch_oacc_kernels.
    
    	gcc/testsuite/
    	* c-c++-common/restrict-2.c: Update for new pass_lim.
    	* c-c++-common/restrict-4.c: Same.
    	* g++.dg/tree-ssa/pr33615.C: Same.
    	* g++.dg/tree-ssa/restrict1.C: Same.
    	* gcc.dg/tm/pub-safety-1.c: Same.
    	* gcc.dg/tm/reg-promotion.c: Same.
    	* gcc.dg/tree-ssa/20050314-1.c: Same.
    	* gcc.dg/tree-ssa/loop-32.c: Same.
    	* gcc.dg/tree-ssa/loop-33.c: Same.
    	* gcc.dg/tree-ssa/loop-34.c: Same.
    	* gcc.dg/tree-ssa/loop-35.c: Same.
    	* gcc.dg/tree-ssa/loop-7.c: Same.
    	* gcc.dg/tree-ssa/pr23109.c: Same.
    	* gcc.dg/tree-ssa/restrict-3.c: Same.
    	* gcc.dg/tree-ssa/restrict-5.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-1.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-10.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-11.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-12.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-2.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-3.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-6.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-7.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-8.c: Same.
    	* gcc.dg/tree-ssa/ssa-lim-9.c: Same.
    	* gcc.dg/tree-ssa/structopt-1.c: Same.
    	* gfortran.dg/pr32921.f: Same.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222283 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                          |    2 ++
 gcc/passes.def                              |    1 +
 gcc/testsuite/ChangeLog.gomp                |   31 +++++++++++++++++++++++++++
 gcc/testsuite/c-c++-common/restrict-2.c     |    6 +++---
 gcc/testsuite/c-c++-common/restrict-4.c     |    6 +++---
 gcc/testsuite/g++.dg/tree-ssa/pr33615.C     |    6 +++---
 gcc/testsuite/g++.dg/tree-ssa/restrict1.C   |    6 +++---
 gcc/testsuite/gcc.dg/tm/pub-safety-1.c      |    6 +++---
 gcc/testsuite/gcc.dg/tm/reg-promotion.c     |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c  |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-32.c     |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-33.c     |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-34.c     |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-35.c     |    8 +++----
 gcc/testsuite/gcc.dg/tree-ssa/loop-7.c      |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/pr23109.c     |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c  |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c  |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c   |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c  |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c  |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c  |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c   |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c   |    8 +++----
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c   |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c   |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c   |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c   |    6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c |    6 +++---
 gcc/testsuite/gfortran.dg/pr32921.f         |    6 +++---
 30 files changed, 117 insertions(+), 83 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 1fb060f..98e33ad 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,7 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* passes.def: Add pass_lim in pass group pass_ch_oacc_kernels.
+
 	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
 	group pass_oacc_kernels.
 	* tree-ssa-loop.c (pass_tree_loop_init::clone)
diff --git gcc/passes.def gcc/passes.def
index 83ae04e..e6c9287 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -92,6 +92,7 @@ along with GCC; see the file COPYING3.  If not see
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
 	      NEXT_PASS (pass_tree_loop_init);
+	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index f06a3e9..68d6d93 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,34 @@
+2015-04-21  Tom de Vries  <tom@codesourcery.com>
+	    Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c-c++-common/restrict-2.c: Update for new pass_lim.
+	* c-c++-common/restrict-4.c: Same.
+	* g++.dg/tree-ssa/pr33615.C: Same.
+	* g++.dg/tree-ssa/restrict1.C: Same.
+	* gcc.dg/tm/pub-safety-1.c: Same.
+	* gcc.dg/tm/reg-promotion.c: Same.
+	* gcc.dg/tree-ssa/20050314-1.c: Same.
+	* gcc.dg/tree-ssa/loop-32.c: Same.
+	* gcc.dg/tree-ssa/loop-33.c: Same.
+	* gcc.dg/tree-ssa/loop-34.c: Same.
+	* gcc.dg/tree-ssa/loop-35.c: Same.
+	* gcc.dg/tree-ssa/loop-7.c: Same.
+	* gcc.dg/tree-ssa/pr23109.c: Same.
+	* gcc.dg/tree-ssa/restrict-3.c: Same.
+	* gcc.dg/tree-ssa/restrict-5.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-1.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-10.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-11.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-12.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-2.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-3.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-6.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-7.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-8.c: Same.
+	* gcc.dg/tree-ssa/ssa-lim-9.c: Same.
+	* gcc.dg/tree-ssa/structopt-1.c: Same.
+	* gfortran.dg/pr32921.f: Same.
+
 2014-12-17  Thomas Schwinge  <thomas@codesourcery.com>
 	    James Norris  <jnorris@codesourcery.com>
 
diff --git gcc/testsuite/c-c++-common/restrict-2.c gcc/testsuite/c-c++-common/restrict-2.c
index 3f71b77..f0b0e15a 100644
--- gcc/testsuite/c-c++-common/restrict-2.c
+++ gcc/testsuite/c-c++-common/restrict-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim2-details" } */
 
 void foo (float * __restrict__ a, float * __restrict__ b, int n, int j)
 {
@@ -10,5 +10,5 @@ void foo (float * __restrict__ a, float * __restrict__ b, int n, int j)
 
 /* We should move the RHS of the store out of the loop.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 11 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 11 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/c-c++-common/restrict-4.c gcc/testsuite/c-c++-common/restrict-4.c
index 3a36def..f791533 100644
--- gcc/testsuite/c-c++-common/restrict-4.c
+++ gcc/testsuite/c-c++-common/restrict-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile }  */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 struct Foo
 {
@@ -15,5 +15,5 @@ void bar(struct Foo f, int * __restrict__ q)
     }
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/g++.dg/tree-ssa/pr33615.C gcc/testsuite/g++.dg/tree-ssa/pr33615.C
index 801b334..2591e00 100644
--- gcc/testsuite/g++.dg/tree-ssa/pr33615.C
+++ gcc/testsuite/g++.dg/tree-ssa/pr33615.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fnon-call-exceptions -fdump-tree-lim1-details -w" } */
+/* { dg-options "-O -fnon-call-exceptions -fdump-tree-lim2-details -w" } */
 
 extern volatile int y;
 
@@ -16,5 +16,5 @@ foo (double a, int x)
 
 // The expression 1.0 / 0.0 should not be treated as a loop invariant
 // if it may throw an exception.
-// { dg-final { scan-tree-dump-times "invariant up to" 0 "lim1" } }
-// { dg-final { cleanup-tree-dump "lim1" } }
+// { dg-final { scan-tree-dump-times "invariant up to" 0 "lim2" } }
+// { dg-final { cleanup-tree-dump "lim2" } }
diff --git gcc/testsuite/g++.dg/tree-ssa/restrict1.C gcc/testsuite/g++.dg/tree-ssa/restrict1.C
index 682de7e..761e7e2 100644
--- gcc/testsuite/g++.dg/tree-ssa/restrict1.C
+++ gcc/testsuite/g++.dg/tree-ssa/restrict1.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 struct Foo
 {
@@ -16,5 +16,5 @@ void bar(Foo f, int * __restrict__ q)
     }
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tm/pub-safety-1.c gcc/testsuite/gcc.dg/tm/pub-safety-1.c
index 660e9a6..6d99410 100644
--- gcc/testsuite/gcc.dg/tm/pub-safety-1.c
+++ gcc/testsuite/gcc.dg/tm/pub-safety-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim1" } */
+/* { dg-options "-fgnu-tm -O1 -fdump-tree-lim2" } */
 
 /* Test that thread visible loads do not get hoisted out of loops if
    the load would not have occurred on each path out of the loop.  */
@@ -20,5 +20,5 @@ void reader()
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Cannot hoist.*DATA_DATA because it is in a transaction" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tm/reg-promotion.c gcc/testsuite/gcc.dg/tm/reg-promotion.c
index e48bfb2..f1d2387 100644
--- gcc/testsuite/gcc.dg/tm/reg-promotion.c
+++ gcc/testsuite/gcc.dg/tm/reg-promotion.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim1" } */
+/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim2" } */
 
 /* Test that `count' is not written to unless p->data>0.  */
 
@@ -20,5 +20,5 @@ void func()
   }
 }
 
-/* { dg-final { scan-tree-dump-times "Cannot hoist conditional load of count because it is in a transaction" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Cannot hoist conditional load of count because it is in a transaction" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
index 8f07781..7f2e477 100644
--- gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
+++ gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-lim1-details --param allow-store-data-races=1" } */
+/* { dg-options "-O1 -fdump-tree-lim2-details --param allow-store-data-races=1" } */
 
 float a[100];
 
@@ -17,5 +17,5 @@ void xxx (void)
 /* Store motion may be applied to the assignment to a[k], since sinf
    cannot read nor write the memory.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 1 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 1 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/loop-32.c gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
index f0c8d30..30b9d72 100644
--- gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -42,5 +42,5 @@ void test3(struct a *A)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/loop-33.c gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
index bf16b13..281b336 100644
--- gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-33.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -36,5 +36,5 @@ void test5(struct a *A, unsigned b)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 4 "lim1" { xfail { lp64 || llp64 } } } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 4 "lim2" { xfail { lp64 || llp64 } } } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/loop-34.c gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
index 125a220..e0ec9cf 100644
--- gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-34.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int r[6];
 
@@ -17,5 +17,5 @@ void f (int n)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of r" 6 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of r" 6 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/loop-35.c gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
index 2d2db70..5a1e875 100644
--- gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-35.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int x;
 int a[100];
@@ -67,6 +67,6 @@ void test4(struct a *A, unsigned LONG b)
     }
 }
 /* long index not hoisted for avr target PR 36561 */
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 8 "lim1" { xfail { "avr-*-*" } } } } */
-/* { dg-final { scan-tree-dump-times "Executing store motion of" 6 "lim1" { target { "avr-*-*" } } } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 8 "lim2" { xfail { "avr-*-*" } } } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of" 6 "lim2" { target { "avr-*-*" } } } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/loop-7.c gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
index 38e19e6..4e83170 100644
--- gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-7.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/19828 */
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-lim1-details" } */
+/* { dg-options "-O1 -fdump-tree-lim2-details" } */
 
 int cst_fun1 (int) __attribute__((__const__));
 int cst_fun2 (int) __attribute__((__const__));
@@ -31,5 +31,5 @@ int xxx (void)
    Calls to cst_fun2 and pure_fun2 should not be, since calling
    with k = 0 may be invalid.  */
 
-/* { dg-final { scan-tree-dump-times "Moving statement" 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving statement" 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/pr23109.c gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
index 73fd84d..0f92311 100644
--- gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
+++ gcc/testsuite/gcc.dg/tree-ssa/pr23109.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -funsafe-math-optimizations -ftrapping-math -fdump-tree-recip -fdump-tree-lim1" } */
+/* { dg-options "-O2 -funsafe-math-optimizations -ftrapping-math -fdump-tree-recip -fdump-tree-lim2" } */
 /* { dg-warning "-fassociative-math disabled" "" { target *-*-* } 1 } */
 
 double F[2] = { 0., 0. }, e = 0.;
@@ -29,8 +29,8 @@ int main()
 /* LIM only performs the transformation in the no-trapping-math case.  In
    the future we will do it for trapping-math as well in recip, check that
    this is not wrongly optimized.  */
-/* { dg-final { scan-tree-dump-not "reciptmp" "lim1" } } */
+/* { dg-final { scan-tree-dump-not "reciptmp" "lim2" } } */
 /* { dg-final { scan-tree-dump-not "reciptmp" "recip" } } */
 /* { dg-final { cleanup-tree-dump "recip" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
 
diff --git gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
index 95cc1a2..c3ca462 100644
--- gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
+++ gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim2-details" } */
 
 void f(int * __restrict__ r,
        int a[__restrict__ 16][16],
@@ -14,5 +14,5 @@ void f(int * __restrict__ r,
 
 /* We should apply store motion to the store to *r.  */
 
-/* { dg-final { scan-tree-dump "Executing store motion of \\\*r" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of \\\*r" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c
index d6c240a..74337e3 100644
--- gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c
+++ gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fno-strict-aliasing -fdump-tree-lim2-details" } */
 
 static inline __attribute__((always_inline))
 void f(int * __restrict__ r,
@@ -20,5 +20,5 @@ void g(int *r, int a[16][16], int b[16][16], int i, int j)
 
 /* We should apply store motion to the store to *r.  */
 
-/* { dg-final { scan-tree-dump "Executing store motion of \\\*r" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of \\\*r" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
index 3952a9a..0b22fc3 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 /* This is a variant that does cause fold to place a cast to
    int before testing bit 1.  */
@@ -18,5 +18,5 @@ quantum_toffoli (int control1, int control2, int target,
     }
 }
 
-/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
index bc14926..4a218e0 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 int *l, *r;
 int test_func(void)
@@ -27,5 +27,5 @@ int test_func(void)
   return i;
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of pos" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of pos" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
index ea91a61..7315025 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fprofile-arcs -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fprofile-arcs -fdump-tree-lim2-details" } */
 
 struct thread_param
 {
@@ -21,5 +21,5 @@ void access_buf(struct thread_param* p)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of __gcov0.access_buf\\\[\[01\]\\\] from loop 1" 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of __gcov0.access_buf\\\[\[01\]\\\] from loop 1" 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
index e0d93a9..07855bb 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 int a[1024];
 
@@ -23,5 +23,5 @@ void bar (int x, int z)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "!= 0 ? " 2 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "!= 0 ? " 2 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
index 2106b62..652d1ba 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1" } */
+/* { dg-options "-O -fdump-tree-lim2" } */
 
 /* This is a variant that doesn't cause fold to place a cast to
    int before testing bit 1.  */
@@ -18,5 +18,5 @@ int size)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "1 <<" 3 "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
index a81857c..29539fa 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 struct { int x; int y; } global;
 void foo(int n)
@@ -9,6 +9,6 @@ void foo(int n)
     global.y += global.x*global.x;
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of global.y" "lim1" } } */
-/* { dg-final { scan-tree-dump "Moving statement.*global.x.*out of loop 1" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of global.y" "lim2" } } */
+/* { dg-final { scan-tree-dump "Moving statement.*global.x.*out of loop 1" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
index 100a230..a70bb2e 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 
 double a[16][64], y[64], x[16];
 void foo(void)
@@ -10,5 +10,5 @@ void foo(void)
       y[j] = y[j] + a[i][j] * x[i];
 }
 
-/* { dg-final { scan-tree-dump "Executing store motion of y" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Executing store motion of y" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
index f8e15f3..6a67234 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 extern const int srcshift;
 
@@ -11,5 +11,5 @@ void foo (int *srcdata, int *dstdata)
     dstdata[i] = srcdata[i] << srcshift;
 }
 
-/* { dg-final { scan-tree-dump "Moving statement" "lim1" } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump "Moving statement" "lim2" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
index 551b68f..c6f56ec 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 void bar (int);
 void foo (int n, int m)
@@ -16,5 +16,5 @@ void foo (int n, int m)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim1"  } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim2"  } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
index c5a6765..2233c90 100644
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-lim1-details" } */
+/* { dg-options "-O -fdump-tree-lim2-details" } */
 
 void bar (int);
 void foo (int n, int m)
@@ -16,5 +16,5 @@ void foo (int n, int m)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim1"  } } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Moving PHI node" 1 "lim2"  } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
index e5fe291..54cf44c 100644
--- gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
+++ gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-lim1-details" } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
 int x; int y;
 struct { int x; int y; } global;
 int foo() {
@@ -10,6 +10,6 @@ int foo() {
 		global.y += global.x*global.x;
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of global.y" 1 "lim1" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of global.y" 1 "lim2" } } */
 /* XXX: We should also check for the load motion of global.x, but there is no easy way to do this.  */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
+/* { dg-final { cleanup-tree-dump "lim2" } } */
diff --git gcc/testsuite/gfortran.dg/pr32921.f gcc/testsuite/gfortran.dg/pr32921.f
index 45ea647..55b5604 100644
--- gcc/testsuite/gfortran.dg/pr32921.f
+++ gcc/testsuite/gfortran.dg/pr32921.f
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O2 -fdump-tree-lim1" }
+! { dg-options "-O2 -fdump-tree-lim2" }
 ! gfortran -c -m32 -O2 -S junk.f
 !
       MODULE LES3D_DATA
@@ -45,5 +45,5 @@
 
       RETURN
       END
-! { dg-final { scan-tree-dump-times "stride" 4 "lim1" } }
-! { dg-final { cleanup-tree-dump "lim1" } }
+! { dg-final { scan-tree-dump-times "stride" 4 "lim2" } }
+! { dg-final { cleanup-tree-dump "lim2" } }


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels
  2014-11-25 12:03   ` Tom de Vries
@ 2015-04-21 20:01     ` Thomas Schwinge
  2015-04-22  7:42       ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 20:01 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3810 bytes --]

Hi!

On Tue, 25 Nov 2014 12:38:55 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 15-11-14 18:22, Tom de Vries wrote:
> > On 15-11-14 13:14, Tom de Vries wrote:
> >> I'm submitting a patch series with initial support for the oacc kernels
> >> directive.
> >>
> >> The patch series uses pass_parallelize_loops to implement parallelization of
> >> loops in the oacc kernels region.
> >>
> >> The patch series consists of these 8 patches:
> >> ...
> >>      1  Expand oacc kernels after pass_build_ealias
> >>      2  Add pass_oacc_kernels
> >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>      5  Add pass_loop_im to pass_oacc_kernels
> >>      6  Add pass_ccp to pass_oacc_kernels
> >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>      8  Do simple omp lowering for no address taken var
> >> ...
> >
> > This patch adds pass_loop_ccp to pass group pass_oacc_kernels.
> >
> > We need this pass to simplify the loop body, and allow pass_parloops to detect
> > that loop iterations are independent.
> >
> 
> As suggested here ( https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html ) 
> I've replaced the pass_ccp with pass_copyprop, which performs trivial constant 
> propagation in addition to copy propagation.
> 
> Bootstrapped and reg-tested as before.
> 
> OK for trunk?

Committed to gomp-4_0-branch in r222284:

commit 1c2529b64620811cbff4a50374af797ee52ef5f8
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:58:54 2015 +0000

    Add pass_copy_prop in pass_oacc_kernels
    
    	gcc/
    	* passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
    	* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
    	conservatively.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222284 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp  |    4 ++++
 gcc/passes.def      |    1 +
 gcc/tree-ssa-copy.c |    4 ++++
 3 files changed, 9 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 98e33ad..0be9191 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,9 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
+	* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
+	conservatively.
+
 	* passes.def: Add pass_lim in pass group pass_ch_oacc_kernels.
 
 	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
diff --git gcc/passes.def gcc/passes.def
index e6c9287..e6f1c33 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -93,6 +93,7 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_ch_oacc_kernels);
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
+	      NEXT_PASS (pass_copy_prop);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
diff --git gcc/tree-ssa-copy.c gcc/tree-ssa-copy.c
index 5ae8e6c..6f35f99 100644
--- gcc/tree-ssa-copy.c
+++ gcc/tree-ssa-copy.c
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-dom.h"
 #include "tree-ssa-loop-niter.h"
+#include "omp-low.h"
 
 
 /* This file implements the copy propagation pass and provides a
@@ -116,6 +117,9 @@ stmt_may_generate_copy (gimple stmt)
   if (gimple_has_volatile_ops (stmt))
     return false;
 
+  if (gimple_stmt_omp_data_i_init_p (stmt))
+    return false;
+
   /* Statements with loads and/or stores will never generate a useful copy.  */
   if (gimple_vuse (stmt))
     return false;


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 7/8] Add pass_parallelize_loops_oacc_kernels to pass_oacc_kernels
  2014-11-25 12:15   ` Tom de Vries
@ 2015-04-21 20:09     ` Thomas Schwinge
  0 siblings, 0 replies; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 20:09 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 35601 bytes --]

Hi!

On Tue, 25 Nov 2014 12:42:28 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 15-11-14 18:23, Tom de Vries wrote:
> > On 15-11-14 13:14, Tom de Vries wrote:
> >> I'm submitting a patch series with initial support for the oacc kernels
> >> directive.
> >>
> >> The patch series uses pass_parallelize_loops to implement parallelization of
> >> loops in the oacc kernels region.
> >>
> >> The patch series consists of these 8 patches:
> >> ...
> >>      1  Expand oacc kernels after pass_build_ealias
> >>      2  Add pass_oacc_kernels
> >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>      5  Add pass_loop_im to pass_oacc_kernels
> >>      6  Add pass_ccp to pass_oacc_kernels
> >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>      8  Do simple omp lowering for no address taken var
> >> ...
> >
> > This patch adds:
> > - a specialized version of pass_parallelize_loops called
> >      pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and
> > - relevant test-cases.
> >
> > The pass only handles loops that are in a kernels region, and skips over bits of
> > pass_parallelize_loops that are already done for oacc kernels.
> >
> > The pass reintroduces the use of omp_expand_local, I haven't managed to make it
> > work yet using the external pass pass_expand_omp_ssa.
> >
> > An obvious limitation of the patch is the fact that we copy over the clauses
> > from the kernels directive to the generated parallel directive. We'll need to do
> > something more intelligent here, f.i. setting vector_length based on the
> > parallelization factor.
> >
> > Another limitation is that the pass still needs -ftree-parallelize-loops to
> > trigger.
> >
> 
> Updated for using pass_copyprop instead of pass_ccp in pass_oacc_kernels.
> 
> Bootstrapped and reg-tested as before.
> 
> OK for trunk?

Committed to gomp-4_0-branch in r222285:

commit 74e09b9dbbe43321fb20b0174f926893bf2111bc
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 20:06:16 2015 +0000

    Add pass_parallelize_loops_oacc_kernels to pass_oacc_kernels
    
    	gcc/
    	* passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
    	pass_oacc_kernels.
    	* tree-parloops.c (create_parallel_loop, gen_parallel_loop): Add
    	function parameters region_entry and bool oacc_kernels_p.  Handle
    	oacc_kernels_p.
    	Call create_parallel_loop with additional args.
    	(parallelize_loops): Add function parameter oacc_kernels_p.  Calculate
    	dominance info.  Skip loops that are not in a kernels region. Call
    	gen_parallel_loop with additional args.
    	(pass_parallelize_loops::execute): Call parallelize_loops with false
    	argument.
    	(pass_data_parallelize_loops_oacc_kernels): New pass_data.
    	(class pass_parallelize_loops_oacc_kernels): New pass.
    	(pass_parallelize_loops_oacc_kernels::execute)
    	(make_pass_parallelize_loops_oacc_kernels): New function.
    	* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.
    
    	gcc/testsuite/
    	* c-c++-common/goacc/kernels-loop-2.c: New test.
    	* c-c++-common/goacc/kernels-loop.c: New test.
    	* c-c++-common/goacc/kernels-loop-n.c: New test.
    	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: New test.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: New test.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: New test.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: New test.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
    	New test.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222285 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                 |   17 ++
 gcc/passes.def                                     |    1 +
 gcc/testsuite/ChangeLog.gomp                       |    5 +
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c  |   62 +++++
 .../c-c++-common/goacc/kernels-loop-mod-not-zero.c |   53 ++++
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c  |   48 ++++
 gcc/testsuite/c-c++-common/goacc/kernels-loop.c    |   53 ++++
 gcc/tree-parloops.c                                |  282 ++++++++++++++++----
 gcc/tree-pass.h                                    |    2 +
 libgomp/ChangeLog.gomp                             |    9 +
 .../libgomp.oacc-c-c++-common/kernels-loop-2.c     |   47 ++++
 .../kernels-loop-mod-not-zero.c                    |   41 +++
 .../libgomp.oacc-c-c++-common/kernels-loop-n.c     |   47 ++++
 .../libgomp.oacc-c-c++-common/kernels-loop.c       |   41 +++
 14 files changed, 650 insertions(+), 58 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 0be9191..bf0ee52 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,22 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
+	pass_oacc_kernels.
+	* tree-parloops.c (create_parallel_loop, gen_parallel_loop): Add
+	function parameters region_entry and bool oacc_kernels_p.  Handle
+	oacc_kernels_p.
+	Call create_parallel_loop with additional args.
+	(parallelize_loops): Add function parameter oacc_kernels_p.  Calculate
+	dominance info.  Skip loops that are not in a kernels region. Call
+	gen_parallel_loop with additional args.
+	(pass_parallelize_loops::execute): Call parallelize_loops with false
+	argument.
+	(pass_data_parallelize_loops_oacc_kernels): New pass_data.
+	(class pass_parallelize_loops_oacc_kernels): New pass.
+	(pass_parallelize_loops_oacc_kernels::execute)
+	(make_pass_parallelize_loops_oacc_kernels): New function.
+	* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.
+
 	* passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
 	* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
 	conservatively.
diff --git gcc/passes.def gcc/passes.def
index e6f1c33..2d2e286 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_copy_prop);
+      	      NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	      NEXT_PASS (pass_tree_loop_done);
 	  POP_INSERT_PASSES ()
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 68d6d93..2c6abff 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,6 +1,11 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 	    Thomas Schwinge  <thomas@codesourcery.com>
 
+	* c-c++-common/goacc/kernels-loop-2.c: New test.
+	* c-c++-common/goacc/kernels-loop.c: New test.
+	* c-c++-common/goacc/kernels-loop-n.c: New test.
+	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: New test.
+
 	* c-c++-common/restrict-2.c: Update for new pass_lim.
 	* c-c++-common/restrict-4.c: Same.
 	* g++.dg/tree-ssa/pr33615.C: Same.
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
new file mode 100644
index 0000000..ab69fe9
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -0,0 +1,62 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc kernels copyout (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc kernels copyout (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only three loops are analyzed, and that all can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
new file mode 100644
index 0000000..261d213
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -0,0 +1,53 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N ((1024 * 512) + 1)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
new file mode 100644
index 0000000..7bf744e
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -0,0 +1,48 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* TODO: parallelize this example.  */
+
+#include <stdlib.h>
+
+#define N ((1024 * 512) + 1)
+#define COUNTERTYPE unsigned int
+
+static int __attribute__((noinline,noclone))
+foo (COUNTERTYPE n)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+
+  for (COUNTERTYPE i = 0; i < n; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < n; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:n], b[0:n]) copyout (c[0:n])
+  {
+    for (COUNTERTYPE ii = 0; ii < n; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < n; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+int
+main (void)
+{
+  return foo (N);
+}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop.c gcc/testsuite/c-c++-common/goacc/kernels-loop.c
new file mode 100644
index 0000000..2391148
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -0,0 +1,53 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index 9a233f4..e218a90 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -1612,9 +1612,10 @@ transform_to_exit_first_loop (struct loop *loop,
    of LOOP_FN.  N_THREADS is the requested number of threads.  Returns the
    basic block containing GIMPLE_OMP_PARALLEL tree.  */
 
-static basic_block
+static void
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
-		      tree new_data, unsigned n_threads, location_t loc)
+		      tree new_data, unsigned n_threads, location_t loc,
+		      basic_block region_entry, bool oacc_kernels_p)
 {
   gimple_stmt_iterator gsi;
   basic_block bb, paral_bb, for_bb, ex_bb;
@@ -1631,15 +1632,69 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   /* Prepare the GIMPLE_OMP_PARALLEL statement.  */
   bb = loop_preheader_edge (loop)->src;
   paral_bb = single_pred (bb);
-  gsi = gsi_last_bb (paral_bb);
+  if (!oacc_kernels_p)
+    gsi = gsi_last_bb (paral_bb);
+  else
+    /* Make sure the oacc parallel is inserted on top of the oacc kernels
+       region.  */
+    gsi = gsi_last_bb (region_entry);
 
-  t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
-  OMP_CLAUSE_NUM_THREADS_EXPR (t)
-    = build_int_cst (integer_type_node, n_threads);
-  omp_par_stmt = gimple_build_omp_parallel (NULL, t, loop_fn, data);
-  gimple_set_location (omp_par_stmt, loc);
+  if (!oacc_kernels_p)
+    {
+      t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
+      OMP_CLAUSE_NUM_THREADS_EXPR (t)
+	= build_int_cst (integer_type_node, n_threads);
+      omp_par_stmt = gimple_build_omp_parallel (NULL, t, loop_fn, data);
+      gimple_set_location (omp_par_stmt, loc);
 
-  gsi_insert_after (&gsi, omp_par_stmt, GSI_NEW_STMT);
+      gsi_insert_after (&gsi, omp_par_stmt, GSI_NEW_STMT);
+    }
+  else
+    {
+      /* Create oacc parallel pragma based on oacc kernels pragma.  */
+      gomp_target *kernels = as_a <gomp_target *> (gsi_stmt (gsi));
+
+      tree clauses = gimple_omp_target_clauses (kernels);
+      /* FIXME: We need a more intelligent mapping onto vector, gangs,
+	 workers.  */
+      if (1)
+	{
+	  tree clause = build_omp_clause (gimple_location (kernels),
+					  OMP_CLAUSE_VECTOR_LENGTH);
+	  OMP_CLAUSE_VECTOR_LENGTH_EXPR (clause)
+	    = build_int_cst (integer_type_node, n_threads);
+	  OMP_CLAUSE_CHAIN (clause) = clauses;
+	  clauses = clause;
+	}
+      gomp_target *stmt
+	= gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_PARALLEL,
+				   clauses);
+      tree child_fn = gimple_omp_target_child_fn (kernels);
+      gimple_omp_target_set_child_fn (stmt, child_fn);
+      tree data_arg = gimple_omp_target_data_arg (kernels);
+      gimple_omp_target_set_data_arg (stmt, data_arg);
+
+      gimple_set_location (stmt, loc);
+
+      /* Insert oacc parallel pragma after the oacc kernels pragma.  */
+      {
+	gimple_stmt_iterator gsi2;
+	gsi = gsi_last_bb (region_entry);
+	gsi2 = gsi;
+	gsi_prev (&gsi2);
+
+	/* Insert pragma acc parallel.  */
+	gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+
+	/* Remove GOACC_kernels.  */
+	replace_uses_by (gimple_vdef (gsi_stmt (gsi2)),
+			 gimple_vuse (gsi_stmt (gsi2)));
+	gsi_remove (&gsi2, true);
+
+	/* Remove pragma acc kernels.  */
+	gsi_remove (&gsi2, true);
+      }
+    }
 
   /* Initialize NEW_DATA.  */
   if (data)
@@ -1657,12 +1712,18 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
       gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
     }
 
-  /* Emit GIMPLE_OMP_RETURN for GIMPLE_OMP_PARALLEL.  */
-  bb = split_loop_exit_edge (single_dom_exit (loop));
-  gsi = gsi_last_bb (bb);
-  omp_return_stmt1 = gimple_build_omp_return (false);
-  gimple_set_location (omp_return_stmt1, loc);
-  gsi_insert_after (&gsi, omp_return_stmt1, GSI_NEW_STMT);
+  /* Skip insertion of OMP_RETURN for oacc_kernels_p.  We've already generated
+     one when lowering the oacc kernels directive in
+     pass_lower_omp/lower_omp (). */
+  if (!oacc_kernels_p)
+    {
+      /* Emit GIMPLE_OMP_RETURN for GIMPLE_OMP_PARALLEL.  */
+      bb = split_loop_exit_edge (single_dom_exit (loop));
+      gsi = gsi_last_bb (bb);
+      omp_return_stmt1 = gimple_build_omp_return (false);
+      gimple_set_location (omp_return_stmt1, loc);
+      gsi_insert_after (&gsi, omp_return_stmt1, GSI_NEW_STMT);
+    }
 
   /* Extract data for GIMPLE_OMP_FOR.  */
   gcc_assert (loop->header == single_dom_exit (loop)->src);
@@ -1719,7 +1780,11 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   t = build_omp_clause (loc, OMP_CLAUSE_SCHEDULE);
   OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
 
-  for_stmt = gimple_build_omp_for (NULL, GF_OMP_FOR_KIND_FOR, t, 1, NULL);
+  for_stmt = gimple_build_omp_for (NULL,
+				   (oacc_kernels_p
+				    ? GF_OMP_FOR_KIND_OACC_LOOP
+				    : GF_OMP_FOR_KIND_FOR),
+				   NULL_TREE, 1, NULL);
   gimple_set_location (for_stmt, loc);
   gimple_omp_for_set_index (for_stmt, 0, initvar);
   gimple_omp_for_set_initial (for_stmt, 0, cvar_init);
@@ -1749,8 +1814,6 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   /* After the above dom info is hosed.  Re-compute it.  */
   free_dominance_info (CDI_DOMINATORS);
   calculate_dominance_info (CDI_DOMINATORS);
-
-  return paral_bb;
 }
 
 /* Generates code to execute the iterations of LOOP in N_THREADS
@@ -1762,7 +1825,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 static void
 gen_parallel_loop (struct loop *loop,
 		   reduction_info_table_type *reduction_list,
-		   unsigned n_threads, struct tree_niter_desc *niter)
+		   unsigned n_threads, struct tree_niter_desc *niter,
+		   basic_block region_entry, bool oacc_kernels_p)
 {
   tree many_iterations_cond, type, nit;
   tree arg_struct, new_arg_struct;
@@ -1843,41 +1907,44 @@ gen_parallel_loop (struct loop *loop,
   if (stmts)
     gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
 
-  if (loop->inner)
-    m_p_thread=2;
-  else
-    m_p_thread=MIN_PER_THREAD;
-
-   many_iterations_cond =
-     fold_build2 (GE_EXPR, boolean_type_node,
-                nit, build_int_cst (type, m_p_thread * n_threads));
-
-  many_iterations_cond
-    = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-		   invert_truthvalue (unshare_expr (niter->may_be_zero)),
-		   many_iterations_cond);
-  many_iterations_cond
-    = force_gimple_operand (many_iterations_cond, &stmts, false, NULL_TREE);
-  if (stmts)
-    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
-  if (!is_gimple_condexpr (many_iterations_cond))
+  if (!oacc_kernels_p)
     {
+      if (loop->inner)
+	m_p_thread=2;
+      else
+	m_p_thread=MIN_PER_THREAD;
+
+      many_iterations_cond =
+	fold_build2 (GE_EXPR, boolean_type_node,
+		     nit, build_int_cst (type, m_p_thread * n_threads));
+
       many_iterations_cond
-	= force_gimple_operand (many_iterations_cond, &stmts,
-				true, NULL_TREE);
+	= fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+		       invert_truthvalue (unshare_expr (niter->may_be_zero)),
+		       many_iterations_cond);
+      many_iterations_cond
+	= force_gimple_operand (many_iterations_cond, &stmts, false, NULL_TREE);
       if (stmts)
 	gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
+      if (!is_gimple_condexpr (many_iterations_cond))
+	{
+	  many_iterations_cond
+	    = force_gimple_operand (many_iterations_cond, &stmts,
+				    true, NULL_TREE);
+	  if (stmts)
+	    gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
+	}
+
+      initialize_original_copy_tables ();
+
+      /* We assume that the loop usually iterates a lot.  */
+      prob = 4 * REG_BR_PROB_BASE / 5;
+      loop_version (loop, many_iterations_cond, NULL,
+		    prob, prob, REG_BR_PROB_BASE - prob, true);
+      update_ssa (TODO_update_ssa);
+      free_original_copy_tables ();
     }
 
-  initialize_original_copy_tables ();
-
-  /* We assume that the loop usually iterates a lot.  */
-  prob = 4 * REG_BR_PROB_BASE / 5;
-  loop_version (loop, many_iterations_cond, NULL,
-		prob, prob, REG_BR_PROB_BASE - prob, true);
-  update_ssa (TODO_update_ssa);
-  free_original_copy_tables ();
-
   /* Base all the induction variables in LOOP on a single control one.  */
   canonicalize_loop_ivs (loop, &nit, true);
 
@@ -1893,19 +1960,30 @@ gen_parallel_loop (struct loop *loop,
   entry = loop_preheader_edge (loop);
   exit = single_dom_exit (loop);
 
-  eliminate_local_variables (entry, exit);
-  /* In the old loop, move all variables non-local to the loop to a structure
-     and back, and create separate decls for the variables used in loop.  */
-  separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
-			    &new_arg_struct, &clsn_data);
+  /* This rewrites the body in terms of new variables.  This has already
+     been done for oacc_kernels_p in pass_lower_omp/lower_omp ().  */
+  if (!oacc_kernels_p)
+    {
+      eliminate_local_variables (entry, exit);
+      /* In the old loop, move all variables non-local to the loop to a
+	 structure and back, and create separate decls for the variables used in
+	 loop.  */
+      separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
+				&new_arg_struct, &clsn_data);
+    }
+  else
+    {
+      arg_struct = NULL_TREE;
+      new_arg_struct = NULL_TREE;
+    }
 
   /* Create the parallel constructs.  */
   loc = UNKNOWN_LOCATION;
   cond_stmt = last_stmt (loop->header);
   if (cond_stmt)
     loc = gimple_location (cond_stmt);
-  create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
-			new_arg_struct, n_threads, loc);
+  create_parallel_loop (loop, create_loop_fn (loc), arg_struct, new_arg_struct,
+			n_threads, loc, region_entry, oacc_kernels_p);
   if (reduction_list->elements () > 0)
     create_call_for_reduction (loop, reduction_list, &clsn_data);
 
@@ -2145,7 +2223,7 @@ try_create_reduction_list (loop_p loop,
    otherwise.  */
 
 static bool
-parallelize_loops (void)
+parallelize_loops (bool oacc_kernels_p)
 {
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
@@ -2154,6 +2232,7 @@ parallelize_loops (void)
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
   source_location loop_loc;
+  basic_block region_entry, region_exit;
 
   /* Do not parallelize loops in the functions created by parallelization.  */
   if (parallelized_function_p (cfun->decl))
@@ -2165,9 +2244,46 @@ parallelize_loops (void)
   reduction_info_table_type reduction_list (10);
   init_stmt_vec_info_vec ();
 
+  calculate_dominance_info (CDI_DOMINATORS);
+
   FOR_EACH_LOOP (loop, 0)
     {
       reduction_list.empty ();
+
+      if (oacc_kernels_p)
+	{
+	  if (!loop_in_oacc_kernels_region_p (loop, &region_entry,
+					      &region_exit))
+	    continue;
+
+	  /* TODO: Allow nested loops.  */
+	  if (loop->inner)
+	    continue;
+
+	  gcc_assert (single_succ_p (region_entry));
+	  basic_block first = single_succ (region_entry);
+
+	  /* TODO: Allow conditional loop entry.  This test triggers when the
+	     loop bound is not known at compile time.  */
+	  if (!single_succ_p (first))
+	    continue;
+
+	  /* TODO: allow more complex loops.  */
+	  if (single_exit (loop) == NULL)
+	    continue;
+
+	  /* TODO: Allow other code than a single loop inside a kernels
+	     region.  */
+	  if (loop->header != single_succ (first)
+	      || single_exit (loop)->dest != region_exit)
+	    continue;
+
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "Trying loop %d with header bb %d in oacc kernels region\n",
+		     loop->num, loop->header->index);
+	}
+
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
         fprintf (dump_file, "Trying loop %d as candidate\n",loop->num);
@@ -2209,6 +2325,7 @@ parallelize_loops (void)
       /* FIXME: Bypass this check as graphite doesn't update the
 	 count and frequency correctly now.  */
       if (!flag_loop_parallelize_all
+	  && !oacc_kernels_p
 	  && ((estimated != -1
 	       && estimated <= (HOST_WIDE_INT) n_threads * MIN_PER_THREAD)
 	      /* Do not bother with loops in cold areas.  */
@@ -2237,8 +2354,9 @@ parallelize_loops (void)
 	  fprintf (dump_file, "\nloop at %s:%d: ",
 		   LOCATION_FILE (loop_loc), LOCATION_LINE (loop_loc));
       }
+
       gen_parallel_loop (loop, &reduction_list,
-			 n_threads, &niter_desc);
+			 n_threads, &niter_desc, region_entry, oacc_kernels_p);
     }
 
   free_stmt_vec_info_vec ();
@@ -2289,7 +2407,7 @@ pass_parallelize_loops::execute (function *fun)
   if (number_of_loops (fun) <= 1)
     return 0;
 
-  if (parallelize_loops ())
+  if (parallelize_loops (false))
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
       return TODO_update_ssa;
@@ -2305,3 +2423,51 @@ make_pass_parallelize_loops (gcc::context *ctxt)
 {
   return new pass_parallelize_loops (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_parallelize_loops_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "parloops_oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_PARALLELIZE_LOOPS, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_parallelize_loops_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_parallelize_loops_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_parallelize_loops_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_parallelize_loops_oacc_kernels
+
+unsigned
+pass_parallelize_loops_oacc_kernels::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  if (parallelize_loops (true))
+    return TODO_update_ssa;
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_parallelize_loops_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_parallelize_loops_oacc_kernels (ctxt);
+}
diff --git gcc/tree-pass.h gcc/tree-pass.h
index 321229a..effcb50 100644
--- gcc/tree-pass.h
+++ gcc/tree-pass.h
@@ -375,6 +375,8 @@ extern gimple_opt_pass *make_pass_slp_vectorize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_complete_unroll (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_complete_unrolli (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_parallelize_loops (gcc::context *ctxt);
+extern gimple_opt_pass *
+  make_pass_parallelize_loops_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index f052d3e..f6968b8 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,12 @@
+2015-04-21  Tom de Vries  <tom@codesourcery.com>
+	    Thomas Schwinge  <thomas@codesourcery.com>
+
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: New test.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: New test.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: New test.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
+	New test.
+
 2015-03-13  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* testsuite/libgomp.fortran/fortran.exp (DG_TORTURE_OPTIONS): Add
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
new file mode 100644
index 0000000..0a0d754
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+#pragma acc kernels copyout (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc kernels copyout (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
new file mode 100644
index 0000000..fdd6256
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N ((1024 * 512) + 1)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
new file mode 100644
index 0000000..52d8e24
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N ((1024 * 512) + 1)
+#define COUNTERTYPE unsigned int
+
+static int __attribute__((noinline,noclone))
+foo (COUNTERTYPE n)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+
+  for (COUNTERTYPE i = 0; i < n; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < n; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:n], b[0:n]) copyout (c[0:n])
+  {
+    for (COUNTERTYPE ii = 0; ii < n; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < n; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+int
+main (void)
+{
+  return foo (N);
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
new file mode 100644
index 0000000..294a5bf
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Handle global loop counters in fortran oacc kernels (was: openacc kernels directive -- initial support)
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (9 preceding siblings ...)
  2015-04-21 19:27 ` Add BUILT_IN_GOACC_KERNELS_INTERNAL (was: openacc kernels directive -- initial support) Thomas Schwinge
@ 2015-04-21 20:24 ` Thomas Schwinge
  2015-04-21 20:29 ` Handle global loop counters in c/c++ " Thomas Schwinge
  2015-04-21 20:33 ` Handle oacc kernels with other directives " Thomas Schwinge
  12 siblings, 0 replies; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 20:24 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 13833 bytes --]

Hi!

On Sat, 15 Nov 2014 13:14:52 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> I'm submitting a patch series with initial support for the oacc kernels directive.

Committed to gomp-4_0-branch in r222286:

commit 0c33234340aa17536c2c86e0982c42070c89226b
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 20:22:54 2015 +0000

    Handle global loop counters in fortran oacc kernels
    
    Unable to have loop counters with a scope limited to the kernels region, and
    the fact that function scope inhibits parallelization, at the technical level,
    it seems possible to do DCE and get rid of the dead code that is inhibiting
    parallelization (in other words, the code copying the loop iterator value out
    of the region), but probably some effort would be involved.
    
    Another possibility is to add an assign of the final value of the loop
    iteration variable after the loop to cut the dependency, though this will only
    work for loops where that value is know at compile time -- which is exactly
    what pass_scev_cprop does.
    
    	gcc/
    	* passes.def: Add pass_scev_cprop to pass_oacc_kernels.
    	* tree-ssa-loop.c (pass_scev_cprop::clone): New function.
    
    	gcc/testsuite/
    	* gcc.dg/pr41488.c: Update for new pass_scev_cprop.
    	* gcc.dg/tree-ssa/loop-17.c: Likewise.
    	* gcc.dg/tree-ssa/loop-39.c: Likewise.
    	* gcc.dg/tree-ssa/scev-7.c: Likewise.
    	* gfortran.dg/goacc/kernels-loop-2.f95: New test.
    	* gfortran.dg/goacc/kernels-loop.f95: New test.
    
    	libgomp/
    	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: New test.
    	* testsuite/libgomp.oacc-fortran/kernels-loop.f95: New test.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222286 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                 |    3 ++
 gcc/passes.def                                     |    1 +
 gcc/testsuite/ChangeLog.gomp                       |    7 +++
 gcc/testsuite/gcc.dg/pr41488.c                     |    6 +--
 gcc/testsuite/gcc.dg/tree-ssa/loop-17.c            |    6 +--
 gcc/testsuite/gcc.dg/tree-ssa/loop-39.c            |    6 +--
 gcc/testsuite/gcc.dg/tree-ssa/scev-7.c             |    6 +--
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 |   46 ++++++++++++++++++++
 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95   |   40 +++++++++++++++++
 gcc/tree-ssa-loop.c                                |    1 +
 libgomp/ChangeLog.gomp                             |    3 ++
 .../libgomp.oacc-fortran/kernels-loop-2.f95        |   32 ++++++++++++++
 .../libgomp.oacc-fortran/kernels-loop.f95          |   28 ++++++++++++
 13 files changed, 173 insertions(+), 12 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index bf0ee52..f14c3718 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* passes.def: Add pass_scev_cprop to pass_oacc_kernels.
+	* tree-ssa-loop.c (pass_scev_cprop::clone): New function.
+
 	* passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
 	pass_oacc_kernels.
 	* tree-parloops.c (create_parallel_loop, gen_parallel_loop): Add
diff --git gcc/passes.def gcc/passes.def
index 2d2e286..3e85808 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_copy_prop);
+	      NEXT_PASS (pass_scev_cprop);
       	      NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	      NEXT_PASS (pass_expand_omp_ssa);
 	      NEXT_PASS (pass_tree_loop_done);
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 2c6abff..eed22e2 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,6 +1,13 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 	    Thomas Schwinge  <thomas@codesourcery.com>
 
+	* gcc.dg/pr41488.c: Update for new pass_scev_cprop.
+	* gcc.dg/tree-ssa/loop-17.c: Likewise.
+	* gcc.dg/tree-ssa/loop-39.c: Likewise.
+	* gcc.dg/tree-ssa/scev-7.c: Likewise.
+	* gfortran.dg/goacc/kernels-loop-2.f95: New test.
+	* gfortran.dg/goacc/kernels-loop.f95: New test.
+
 	* c-c++-common/goacc/kernels-loop-2.c: New test.
 	* c-c++-common/goacc/kernels-loop.c: New test.
 	* c-c++-common/goacc/kernels-loop-n.c: New test.
diff --git gcc/testsuite/gcc.dg/pr41488.c gcc/testsuite/gcc.dg/pr41488.c
index c4bc428..1f306b4 100644
--- gcc/testsuite/gcc.dg/pr41488.c
+++ gcc/testsuite/gcc.dg/pr41488.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-sccp-scev" } */
+/* { dg-options "-O2 -fdump-tree-sccp2-scev" } */
 
 struct struct_t
 {
@@ -14,5 +14,5 @@ void foo (struct struct_t* sp, int start, int end)
     sp->data[i+start] = 0;
 }
 
-/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp" } } */
-/* { dg-final { cleanup-tree-dump "sccp" } } */
+/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp2" } } */
+/* { dg-final { cleanup-tree-dump "sccp2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/loop-17.c gcc/testsuite/gcc.dg/tree-ssa/loop-17.c
index 0e856d8..d51fe57 100644
--- gcc/testsuite/gcc.dg/tree-ssa/loop-17.c
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-sccp-details" } */
+/* { dg-options "-O -fdump-tree-sccp2-details" } */
 
 /* To determine the number of iterations in this loop we need to fold
    p_4 + 4B > p_4 + 8B to false.  This transformation has caused
@@ -15,5 +15,5 @@ int foo (int *p)
   return i;
 }
 
-/* { dg-final { scan-tree-dump "# of iterations 1, bounded by 1" "sccp" } } */
-/* { dg-final { cleanup-tree-dump "sccp" } } */
+/* { dg-final { scan-tree-dump "# of iterations 1, bounded by 1" "sccp2" } } */
+/* { dg-final { cleanup-tree-dump "sccp2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/loop-39.c gcc/testsuite/gcc.dg/tree-ssa/loop-39.c
index 1f6bba4..5c56f00 100644
--- gcc/testsuite/gcc.dg/tree-ssa/loop-39.c
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-39.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-sccp-details" } */
+/* { dg-options "-O2 -fdump-tree-sccp2-details" } */
 
 int
 foo (unsigned int n)
@@ -22,5 +22,5 @@ foo (unsigned int n)
   return r + n;
 }
 
-/* { dg-final { scan-tree-dump "# of iterations \[^\n\r]*, bounded by 8" "sccp" } } */
-/* { dg-final { cleanup-tree-dump "sccp" } } */
+/* { dg-final { scan-tree-dump "# of iterations \[^\n\r]*, bounded by 8" "sccp2" } } */
+/* { dg-final { cleanup-tree-dump "sccp2" } } */
diff --git gcc/testsuite/gcc.dg/tree-ssa/scev-7.c gcc/testsuite/gcc.dg/tree-ssa/scev-7.c
index d6ceb20..0b3928f 100644
--- gcc/testsuite/gcc.dg/tree-ssa/scev-7.c
+++ gcc/testsuite/gcc.dg/tree-ssa/scev-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-sccp-scev" } */
+/* { dg-options "-O2 -fdump-tree-sccp2-scev" } */
 
 struct struct_t
 {
@@ -14,5 +14,5 @@ void foo (struct struct_t* sp, int start, int end)
     sp->data[i+start] = 0;
 }
 
-/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp" } } */
-/* { dg-final { cleanup-tree-dump "sccp" } } */
+/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp2" } } */
+/* { dg-final { cleanup-tree-dump "sccp2" } } */
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
new file mode 100644
index 0000000..bef69f8
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -0,0 +1,46 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc kernels copyout (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyout (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
+
+! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
new file mode 100644
index 0000000..be5f26d
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
@@ -0,0 +1,40 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only one loop is analyzed, and that it can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+
+! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
index 2a96a39..0915cee 100644
--- gcc/tree-ssa-loop.c
+++ gcc/tree-ssa-loop.c
@@ -425,6 +425,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_scev_cprop; }
   virtual unsigned int execute (function *) { return scev_const_prop (); }
+  opt_pass * clone () { return new pass_scev_cprop (m_ctxt); }
 
 }; // class pass_scev_cprop
 
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index f6968b8..bcb3340 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,6 +1,9 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 	    Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: New test.
+	* testsuite/libgomp.oacc-fortran/kernels-loop.f95: New test.
+
 	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: New test.
 	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: New test.
 	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: New test.
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
new file mode 100644
index 0000000..1fb40ee
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
@@ -0,0 +1,32 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc kernels copyout (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyout (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
new file mode 100644
index 0000000..b02dd57
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
@@ -0,0 +1,28 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Handle global loop counters in c/c++ oacc kernels (was: openacc kernels directive -- initial support)
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (10 preceding siblings ...)
  2015-04-21 20:24 ` Handle global loop counters in fortran oacc kernels " Thomas Schwinge
@ 2015-04-21 20:29 ` Thomas Schwinge
  2015-04-21 20:33 ` Handle oacc kernels with other directives " Thomas Schwinge
  12 siblings, 0 replies; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 20:29 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 13493 bytes --]

Hi!

On Sat, 15 Nov 2014 13:14:52 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> I'm submitting a patch series with initial support for the oacc kernels directive.

Committed to gomp-4_0-branch in r222287:

commit abaf92b2db3c0799edac63cfb846af2dbde47423
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 20:27:40 2015 +0000

    Handle global loop counters in c/c++ oacc kernels
    
    	gcc/
    	* passes.def: Add pass_fre after pass_ch_oacc_kernels.
    
    	gcc/testsuite/
    	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: New test.
    	* c-c++-common/goacc/kernels-one-counter-var.c: New test.
    	* g++.dg/ipa/devirt-37.C: Update for new pass_fre.
    	* g++.dg/ipa/devirt-40.C: Likewise.
    	* g++.dg/tree-ssa/pr61034.C: Likewise.
    	* gcc.dg/ipa/ipa-pta-13.c: Likewise.
    	* gcc.dg/ipa/ipa-pta-3.c: Likewise.
    	* gcc.dg/ipa/ipa-pta-4.c: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222287 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                 |    2 +
 gcc/passes.def                                     |    1 +
 gcc/testsuite/ChangeLog.gomp                       |    9 ++++
 .../goacc/kernels-counter-vars-function-scope.c    |   55 ++++++++++++++++++++
 .../c-c++-common/goacc/kernels-one-counter-var.c   |   54 +++++++++++++++++++
 gcc/testsuite/g++.dg/ipa/devirt-37.C               |   12 ++---
 gcc/testsuite/g++.dg/ipa/devirt-40.C               |    6 +--
 gcc/testsuite/g++.dg/tree-ssa/pr61034.C            |   10 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c              |    6 +--
 gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c               |    6 +--
 gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c               |    6 +--
 11 files changed, 144 insertions(+), 23 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index f14c3718..b1933ba 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,7 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* passes.def: Add pass_fre after pass_ch_oacc_kernels.
+
 	* passes.def: Add pass_scev_cprop to pass_oacc_kernels.
 	* tree-ssa-loop.c (pass_scev_cprop::clone): New function.
 
diff --git gcc/passes.def gcc/passes.def
index 3e85808..04cbba0 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_oacc_kernels);
 	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
 	      NEXT_PASS (pass_ch_oacc_kernels);
+	      NEXT_PASS (pass_fre);
 	      NEXT_PASS (pass_tree_loop_init);
 	      NEXT_PASS (pass_lim);
 	      NEXT_PASS (pass_copy_prop);
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index eed22e2..ed80f5b 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,6 +1,15 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 	    Thomas Schwinge  <thomas@codesourcery.com>
 
+	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: New test.
+	* c-c++-common/goacc/kernels-one-counter-var.c: New test.
+	* g++.dg/ipa/devirt-37.C: Update for new pass_fre.
+	* g++.dg/ipa/devirt-40.C: Likewise.
+	* g++.dg/tree-ssa/pr61034.C: Likewise.
+	* gcc.dg/ipa/ipa-pta-13.c: Likewise.
+	* gcc.dg/ipa/ipa-pta-3.c: Likewise.
+	* gcc.dg/ipa/ipa-pta-4.c: Likewise.
+
 	* gcc.dg/pr41488.c: Update for new pass_scev_cprop.
 	* gcc.dg/tree-ssa/loop-17.c: Likewise.
 	* gcc.dg/tree-ssa/loop-39.c: Likewise.
diff --git gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
new file mode 100644
index 0000000..06cdb29
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
@@ -0,0 +1,55 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+  COUNTERTYPE i;
+  COUNTERTYPE ii;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+  for (i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+  for (i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
new file mode 100644
index 0000000..2699437
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
@@ -0,0 +1,54 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+  COUNTERTYPE i;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+  for (i = 0; i < N; i++)
+    a[i] = i * 2;
+
+  for (i = 0; i < N; i++)
+    b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+    for (i = 0; i < N; i++)
+      c[i] = a[i] + b[i];
+  }
+
+  for (i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/g++.dg/ipa/devirt-37.C gcc/testsuite/g++.dg/ipa/devirt-37.C
index 7e1acdc..eb2c7f2 100644
--- gcc/testsuite/g++.dg/ipa/devirt-37.C
+++ gcc/testsuite/g++.dg/ipa/devirt-37.C
@@ -1,4 +1,4 @@
-/* { dg-options "-fpermissive -O2 -fno-indirect-inlining -fno-devirtualize-speculatively -fdump-tree-fre2-details -fno-early-inlining"  } */
+/* { dg-options "-fpermissive -O2 -fno-indirect-inlining -fno-devirtualize-speculatively -fdump-tree-fre3-details -fno-early-inlining"  } */
 #include <stdlib.h>
 struct A {virtual void test() {abort ();}};
 struct B:A
@@ -30,8 +30,8 @@ t()
 /* After inlining the call within constructor needs to be checked to not go into a basetype.
    We should see the vtbl store and we should notice extcall as possibly clobbering the
    type but ignore it because b is in static storage.  */
-/* { dg-final { scan-tree-dump "No dynamic type change found."  "fre2"  } } */
-/* { dg-final { scan-tree-dump "Checking vtbl store:"  "fre2"  } } */
-/* { dg-final { scan-tree-dump "Function call may change dynamic type:extcall"  "fre2"  } } */
-/* { dg-final { scan-tree-dump "converting indirect call to function virtual void"  "fre2"  } } */
-/* { dg-final { cleanup-tree-dump "fre2" } } */
+/* { dg-final { scan-tree-dump "No dynamic type change found."  "fre3"  } } */
+/* { dg-final { scan-tree-dump "Checking vtbl store:"  "fre3"  } } */
+/* { dg-final { scan-tree-dump "Function call may change dynamic type:extcall"  "fre3"  } } */
+/* { dg-final { scan-tree-dump "converting indirect call to function virtual void"  "fre3"  } } */
+/* { dg-final { cleanup-tree-dump "fre3" } } */
diff --git gcc/testsuite/g++.dg/ipa/devirt-40.C gcc/testsuite/g++.dg/ipa/devirt-40.C
index 79cb129..7e4ae7c 100644
--- gcc/testsuite/g++.dg/ipa/devirt-40.C
+++ gcc/testsuite/g++.dg/ipa/devirt-40.C
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fdump-tree-fre2-details"  } */
+/* { dg-options "-O2 -fdump-tree-fre3-details"  } */
 typedef enum
 {
 } UErrorCode;
@@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString &, int &p2, UErrorCode &) const
   UnicodeString a[2];
 }
 
-/* { dg-final { scan-tree-dump-not "\\n  OBJ_TYPE_REF" "fre2"  } } */
-/* { dg-final { cleanup-tree-dump "fre2" } } */
+/* { dg-final { scan-tree-dump-not "\\n  OBJ_TYPE_REF" "fre3"  } } */
+/* { dg-final { cleanup-tree-dump "fre3" } } */
diff --git gcc/testsuite/g++.dg/tree-ssa/pr61034.C gcc/testsuite/g++.dg/tree-ssa/pr61034.C
index 9ec3995..78417a1 100644
--- gcc/testsuite/g++.dg/tree-ssa/pr61034.C
+++ gcc/testsuite/g++.dg/tree-ssa/pr61034.C
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O3 -fdump-tree-fre2" }
+// { dg-options "-O3 -fdump-tree-fre3" }
 
 #define assume(x) if(!(x))__builtin_unreachable()
 
@@ -41,7 +41,7 @@ bool f(I a, I b, I c, I d) {
 // a bunch of conditional free()s and unreachable()s.
 // This works only if everything is inlined into 'f'.
 
-// { dg-final { scan-tree-dump-times ";; Function" 1 "fre2" } }
-// { dg-final { scan-tree-dump-times "free" 19 "fre2" } }
-// { dg-final { scan-tree-dump-times "unreachable" 11 "fre2" } }
-// { dg-final { cleanup-tree-dump "fre2" } }
+// { dg-final { scan-tree-dump-times ";; Function" 1 "fre3" } }
+// { dg-final { scan-tree-dump-times "free" 19 "fre3" } }
+// { dg-final { scan-tree-dump-times "unreachable" 11 "fre3" } }
+// { dg-final { cleanup-tree-dump "fre3" } }
diff --git gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
index f7f95f4..8d73900 100644
--- gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
+++ gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
@@ -1,5 +1,5 @@
 /* { dg-do link } */
-/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta-details -fdump-tree-fre2 -fno-ipa-icf" } */
+/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta-details -fdump-tree-fre3 -fno-ipa-icf" } */
 
 static int x, y;
 
@@ -54,9 +54,9 @@ int main()
   local_address_taken (&y);
   /* As we are computing flow- and context-insensitive we may not
      CSE the load of x here.  */
-  /* { dg-final { scan-tree-dump " = x;" "fre2" } } */
+  /* { dg-final { scan-tree-dump " = x;" "fre3" } } */
   return x;
 }
 
 /* { dg-final { cleanup-ipa-dump "pta" } } */
-/* { dg-final { cleanup-tree-dump "fre2" } } */
+/* { dg-final { cleanup-tree-dump "fre3" } } */
diff --git gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c
index 4790080..2398a21 100644
--- gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c
+++ gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta-details -fdump-tree-fre2-details" } */
+/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta-details -fdump-tree-fre3-details" } */
 
 static int __attribute__((noinline,noclone))
 foo (int *p, int *q)
@@ -23,6 +23,6 @@ int main()
 
 /* { dg-final { scan-ipa-dump "foo.arg0 = &a" "pta" } } */
 /* { dg-final { scan-ipa-dump "foo.arg1 = &b" "pta" } } */
-/* { dg-final { scan-tree-dump "Replaced \\\*p_2\\\(D\\\) with 1" "fre2" } } */
-/* { dg-final { cleanup-tree-dump "fre2" } } */
+/* { dg-final { scan-tree-dump "Replaced \\\*p_2\\\(D\\\) with 1" "fre3" } } */
+/* { dg-final { cleanup-tree-dump "fre3" } } */
 /* { dg-final { cleanup-ipa-dump "pta" } } */
diff --git gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c
index bf6fa28..b72489f 100644
--- gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c
+++ gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta-details -fdump-tree-fre2-details" } */
+/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta-details -fdump-tree-fre3-details" } */
 
 int a, b;
 
@@ -28,6 +28,6 @@ int main()
 
 /* { dg-final { scan-ipa-dump "foo.arg0 = &a" "pta" } } */
 /* { dg-final { scan-ipa-dump "foo.arg1 = &b" "pta" } } */
-/* { dg-final { scan-tree-dump "Replaced \\\*p_2\\\(D\\\) with 1" "fre2" } } */
-/* { dg-final { cleanup-tree-dump "fre2" } } */
+/* { dg-final { scan-tree-dump "Replaced \\\*p_2\\\(D\\\) with 1" "fre3" } } */
+/* { dg-final { cleanup-tree-dump "fre3" } } */
 /* { dg-final { cleanup-ipa-dump "pta" } } */


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Handle oacc kernels with other directives (was: openacc kernels directive -- initial support)
  2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
                   ` (11 preceding siblings ...)
  2015-04-21 20:29 ` Handle global loop counters in c/c++ " Thomas Schwinge
@ 2015-04-21 20:33 ` Thomas Schwinge
  12 siblings, 0 replies; 71+ messages in thread
From: Thomas Schwinge @ 2015-04-21 20:33 UTC (permalink / raw)
  To: GCC Patches; +Cc: Tom de Vries, Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 53183 bytes --]

Hi!

On Sat, 15 Nov 2014 13:14:52 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> I'm submitting a patch series with initial support for the oacc kernels directive.

Committed to gomp-4_0-branch in r222288:

commit 7109b39defb87bc839983339c9fb4cdcb3891238
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 20:32:01 2015 +0000

    Handle oacc kernels with other directives
    
    Mark directives with fn spec attributes to prevent them from acting as
    optimization barrier.
    
    	gcc/
    	* builtin-attrs.def (DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
    	(ATTR_FNSPEC_DOT_DOT_r_r_r_NOTHROW_LIST): Add DEF_ATTR_TREE_LIST.
    	* omp-builtins.def (BUILT_IN_GOACC_DATA_START)
    	(BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_UPDATE): Use
    	DEF_GOACC_BUILTIN_FNSPEC instead of DEF_GOACC_BUILTIN.
    
    	gcc/testsuite/
    	* c-c++-common/goacc/kernels-loop-data-2.c: New test.
    	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: New test.
    	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: New test.
    	* c-c++-common/goacc/kernels-loop-data-update.c: New test.
    	* c-c++-common/goacc/kernels-loop-data.c: New test.
    	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: New
    	test.
    	* gfortran.dg/goacc/kernels-loop-data-2.f95: New test.
    	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: New test.
    	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: New test.
    	* gfortran.dg/goacc/kernels-loop-data-update.f95: New test.
    	* gfortran.dg/goacc/kernels-loop-data.f95: New test.
    	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: New
    	test.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c: New
    	test.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c:
    	New test.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c:
    	New test.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c:
    	New test.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c: New
    	test.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c:
    	New test.
    	* testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95: New
    	test.
    	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95:
    	New test.
    	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95:
    	New test.
    	* testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95: New
    	test.
    	* testsuite/libgomp.oacc-fortran/kernels-loop-data.f95: New test.
    	* testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95:
    	New test.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222288 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                 |    6 ++
 gcc/builtin-attrs.def                              |    3 +
 gcc/omp-builtins.def                               |   21 +++---
 gcc/testsuite/ChangeLog.gomp                       |   15 +++++
 .../c-c++-common/goacc/kernels-loop-data-2.c       |   71 ++++++++++++++++++++
 .../goacc/kernels-loop-data-enter-exit-2.c         |   69 +++++++++++++++++++
 .../goacc/kernels-loop-data-enter-exit.c           |   66 ++++++++++++++++++
 .../c-c++-common/goacc/kernels-loop-data-update.c  |   66 ++++++++++++++++++
 .../c-c++-common/goacc/kernels-loop-data.c         |   65 ++++++++++++++++++
 .../goacc/kernels-parallel-loop-data-enter-exit.c  |   67 ++++++++++++++++++
 .../gfortran.dg/goacc/kernels-loop-data-2.f95      |   52 ++++++++++++++
 .../goacc/kernels-loop-data-enter-exit-2.f95       |   52 ++++++++++++++
 .../goacc/kernels-loop-data-enter-exit.f95         |   50 ++++++++++++++
 .../gfortran.dg/goacc/kernels-loop-data-update.f95 |   49 ++++++++++++++
 .../gfortran.dg/goacc/kernels-loop-data.f95        |   50 ++++++++++++++
 .../kernels-parallel-loop-data-enter-exit.f95      |   51 ++++++++++++++
 libgomp/ChangeLog.gomp                             |   24 +++++++
 .../kernels-loop-data-2.c                          |   56 +++++++++++++++
 .../kernels-loop-data-enter-exit-2.c               |   54 +++++++++++++++
 .../kernels-loop-data-enter-exit.c                 |   51 ++++++++++++++
 .../kernels-loop-data-update.c                     |   53 +++++++++++++++
 .../libgomp.oacc-c-c++-common/kernels-loop-data.c  |   50 ++++++++++++++
 .../kernels-parallel-loop-data-enter-exit.c        |   52 ++++++++++++++
 .../libgomp.oacc-fortran/kernels-loop-data-2.f95   |   38 +++++++++++
 .../kernels-loop-data-enter-exit-2.f95             |   38 +++++++++++
 .../kernels-loop-data-enter-exit.f95               |   36 ++++++++++
 .../kernels-loop-data-update.f95                   |   36 ++++++++++
 .../libgomp.oacc-fortran/kernels-loop-data.f95     |   36 ++++++++++
 .../kernels-parallel-loop-data-enter-exit.f95      |   37 ++++++++++
 29 files changed, 1306 insertions(+), 8 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index b1933ba..1e12554 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,11 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* builtin-attrs.def (DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
+	(ATTR_FNSPEC_DOT_DOT_r_r_r_NOTHROW_LIST): Add DEF_ATTR_TREE_LIST.
+	* omp-builtins.def (BUILT_IN_GOACC_DATA_START)
+	(BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_UPDATE): Use
+	DEF_GOACC_BUILTIN_FNSPEC instead of DEF_GOACC_BUILTIN.
+
 	* passes.def: Add pass_fre after pass_ch_oacc_kernels.
 
 	* passes.def: Add pass_scev_cprop to pass_oacc_kernels.
diff --git gcc/builtin-attrs.def gcc/builtin-attrs.def
index 8eca053..2897c19 100644
--- gcc/builtin-attrs.def
+++ gcc/builtin-attrs.def
@@ -65,6 +65,7 @@ DEF_ATTR_FOR_INT (6)
 		      ATTR_##ENUM, ATTR_NULL)
 DEF_ATTR_FOR_STRING (STR1, "1")
 DEF_ATTR_FOR_STRING (DOT_DOT_DOT_r_r_r, "...rrr")
+DEF_ATTR_FOR_STRING (DOT_DOT_r_r_r, "..rrr")
 #undef DEF_ATTR_FOR_STRING
 
 /* Construct a tree for a list of two integers.  */
@@ -131,6 +132,8 @@ DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LEAF_LIST, ATTR_PURE,	\
 DEF_ATTR_TREE_LIST (ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST, \
 		    ATTR_FNSPEC, ATTR_LIST_DOT_DOT_DOT_r_r_r, \
 		    ATTR_NOTHROW_LIST)
+DEF_ATTR_TREE_LIST (ATTR_FNSPEC_DOT_DOT_r_r_r_NOTHROW_LIST, \
+		    ATTR_FNSPEC, ATTR_LIST_DOT_DOT_r_r_r, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LIST, ATTR_NORETURN,	\
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
diff --git gcc/omp-builtins.def gcc/omp-builtins.def
index cd273f2..ba64976 100644
--- gcc/omp-builtins.def
+++ gcc/omp-builtins.def
@@ -32,13 +32,17 @@ along with GCC; see the file COPYING3.  If not see
 
 DEF_GOACC_BUILTIN (BUILT_IN_ACC_GET_DEVICE_TYPE, "acc_get_device_type",
 		   BT_FN_INT, ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_START, "GOACC_data_start",
-		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_DATA_START, "GOACC_data_start",
+		   	  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR,
+		   	  ATTR_FNSPEC_DOT_DOT_r_r_r_NOTHROW_LIST,
+		   	  ATTR_NOTHROW_LIST, "..rrr")
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_END, "GOACC_data_end",
 		   BT_FN_VOID, ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data",
-		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
-		   ATTR_NOTHROW_LIST)
+DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_ENTER_EXIT_DATA,
+			  "GOACC_enter_exit_data",
+			  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
+			  ATTR_FNSPEC_DOT_DOT_r_r_r_NOTHROW_LIST,
+			  ATTR_NOTHROW_LIST, "..rrr")
 DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_KERNELS_INTERNAL,
 			  "GOACC_kernels_internal",
 			  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
@@ -50,9 +54,10 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_KERNELS, "GOACC_kernels",
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel",
 		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
-		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
-		   ATTR_NOTHROW_LIST)
+DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_UPDATE, "GOACC_update",
+			  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
+			  ATTR_FNSPEC_DOT_DOT_r_r_r_NOTHROW_LIST,
+			  ATTR_NOTHROW_LIST, "..rrr")
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait",
 		   BT_FN_VOID_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index ed80f5b..4c2928b 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,6 +1,21 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 	    Thomas Schwinge  <thomas@codesourcery.com>
 
+	* c-c++-common/goacc/kernels-loop-data-2.c: New test.
+	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: New test.
+	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: New test.
+	* c-c++-common/goacc/kernels-loop-data-update.c: New test.
+	* c-c++-common/goacc/kernels-loop-data.c: New test.
+	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: New
+	test.
+	* gfortran.dg/goacc/kernels-loop-data-2.f95: New test.
+	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: New test.
+	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: New test.
+	* gfortran.dg/goacc/kernels-loop-data-update.f95: New test.
+	* gfortran.dg/goacc/kernels-loop-data.f95: New test.
+	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: New
+	test.
+
 	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: New test.
 	* c-c++-common/goacc/kernels-one-counter-var.c: New test.
 	* g++.dg/ipa/devirt-37.C: Update for new pass_fre.
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
new file mode 100644
index 0000000..fc6da6e
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
@@ -0,0 +1,71 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc data copyout (a[0:N])
+  {
+#pragma acc kernels present (a[0:N])
+    {
+      for (COUNTERTYPE i = 0; i < N; i++)
+	a[i] = i * 2;
+    }
+  }
+
+#pragma acc data copyout (b[0:N])
+  {
+#pragma acc kernels present (b[0:N])
+    {
+      for (COUNTERTYPE i = 0; i < N; i++)
+	b[i] = i * 4;
+    }
+  }
+
+#pragma acc data copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+    {
+      for (COUNTERTYPE ii = 0; ii < N; ii++)
+	c[ii] = a[ii] + b[ii];
+    }
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only three loops are analyzed, and that all can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
new file mode 100644
index 0000000..945359f
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
@@ -0,0 +1,69 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc enter data create (a[0:N])
+#pragma acc kernels present (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+#pragma acc exit data copyout (a[0:N])
+
+#pragma acc enter data create (b[0:N])
+#pragma acc kernels present (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+#pragma acc exit data copyout (b[0:N])
+
+
+#pragma acc enter data copyin (a[0:N], b[0:N]) create (c[0:N])
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+#pragma acc exit data copyout (c[0:N])
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only three loops are analyzed, and that all can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
new file mode 100644
index 0000000..2d6e5e3
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
@@ -0,0 +1,66 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc enter data create (a[0:N], b[0:N], c[0:N])
+
+#pragma acc kernels present (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc kernels present (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+#pragma acc exit data copyout (a[0:N], c[0:N])
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only three loops are analyzed, and that all can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
new file mode 100644
index 0000000..c7aaf0f
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
@@ -0,0 +1,66 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc enter data create (a[0:N], b[0:N], c[0:N])
+
+#pragma acc kernels present (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc update device (b[0:N])
+
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+#pragma acc exit data copyout (a[0:N], c[0:N])
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only two loops are analyzed, and that both can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
new file mode 100644
index 0000000..46ca9c5
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
@@ -0,0 +1,65 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc data copyout (a[0:N], b[0:N], c[0:N])
+  {
+#pragma acc kernels present (a[0:N])
+    {
+      for (COUNTERTYPE i = 0; i < N; i++)
+	a[i] = i * 2;
+    }
+
+#pragma acc kernels present (b[0:N])
+    {
+      for (COUNTERTYPE i = 0; i < N; i++)
+	b[i] = i * 4;
+    }
+
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+    {
+      for (COUNTERTYPE ii = 0; ii < N; ii++)
+	c[ii] = a[ii] + b[ii];
+    }
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only three loops are analyzed, and that all can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
new file mode 100644
index 0000000..3e799ed
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
@@ -0,0 +1,67 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc enter data create (a[0:N], b[0:N], c[0:N])
+
+#pragma acc kernels present (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc parallel present (b[0:N])
+  {
+#pragma acc loop
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+#pragma acc exit data copyout (a[0:N], b[0:N], c[0:N])
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that only two loops are analyzed, and that both can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
new file mode 100644
index 0000000..1b75a23
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -0,0 +1,52 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc data copyout (a(0:n-1))
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  !$acc data copyout (b(0:n-1))
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  !$acc data copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
+
+! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
new file mode 100644
index 0000000..4ba83b6
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
@@ -0,0 +1,52 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc enter data create (a(0:n-1))
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+  !$acc exit data copyout (a(0:n-1))
+
+  !$acc enter data create (b(0:n-1))
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+  !$acc exit data copyout (b(0:n-1))
+
+  !$acc enter data copyin (a(0:n-1), b(0:n-1)) create (c(0:n-1))
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+  !$acc exit data copyout (c(0:n-1))
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
+
+! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
new file mode 100644
index 0000000..2b05b33
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
@@ -0,0 +1,50 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc enter data create (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  !$acc exit data copyout (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
+
+! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
new file mode 100644
index 0000000..b3c80dc
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
@@ -0,0 +1,49 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc enter data create (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+
+  !$acc update device (b(0:n-1))
+
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  !$acc exit data copyout (a(0:n-1), c(0:n-1))
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+
+! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
new file mode 100644
index 0000000..98c5e7a
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
@@ -0,0 +1,50 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc data copyout (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  !$acc end data
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
+
+! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
new file mode 100644
index 0000000..7ea2b49
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
@@ -0,0 +1,51 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc enter data create (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc parallel present (b(0:n-1))
+  !$acc loop
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end parallel
+
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  !$acc exit data copyout (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
+
+! { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index bcb3340..3d762bd 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,6 +1,30 @@
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 	    Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c: New
+	test.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c:
+	New test.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c:
+	New test.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c:
+	New test.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c: New
+	test.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c:
+	New test.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95: New
+	test.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95:
+	New test.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95:
+	New test.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95: New
+	test.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data.f95: New test.
+	* testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95:
+	New test.
+
 	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: New test.
 	* testsuite/libgomp.oacc-fortran/kernels-loop.f95: New test.
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c
new file mode 100644
index 0000000..325ea7d
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c
@@ -0,0 +1,56 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+#pragma acc data copyout (a[0:N])
+  {
+#pragma acc kernels present (a[0:N])
+    {
+      for (COUNTERTYPE i = 0; i < N; i++)
+	a[i] = i * 2;
+    }
+  }
+
+#pragma acc data copyout (b[0:N])
+  {
+#pragma acc kernels present (b[0:N])
+    {
+      for (COUNTERTYPE i = 0; i < N; i++)
+	b[i] = i * 4;
+    }
+  }
+
+#pragma acc data copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+    {
+      for (COUNTERTYPE ii = 0; ii < N; ii++)
+	c[ii] = a[ii] + b[ii];
+    }
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c
new file mode 100644
index 0000000..9c378a2
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+#pragma acc enter data create (a[0:N])
+#pragma acc kernels present (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+#pragma acc exit data copyout (a[0:N])
+
+#pragma acc enter data create (b[0:N])
+#pragma acc kernels present (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+#pragma acc exit data copyout (b[0:N])
+
+
+#pragma acc enter data copyin (a[0:N], b[0:N]) create (c[0:N])
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+#pragma acc exit data copyout (c[0:N])
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c
new file mode 100644
index 0000000..78cf4c1
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+#pragma acc enter data create (a[0:N], b[0:N], c[0:N])
+
+#pragma acc kernels present (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc kernels present (b[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+#pragma acc exit data copyout (a[0:N], b[0:N], c[0:N])
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c
new file mode 100644
index 0000000..67c2c36
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+#pragma acc enter data create (a[0:N], b[0:N], c[0:N])
+
+#pragma acc kernels present (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc update device (b[0:N])
+
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+#pragma acc exit data copyout (a[0:N], c[0:N])
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c
new file mode 100644
index 0000000..acd7f30
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+#pragma acc data copyout (a[0:N], b[0:N], c[0:N])
+  {
+#pragma acc kernels present (a[0:N])
+    {
+      for (COUNTERTYPE i = 0; i < N; i++)
+	a[i] = i * 2;
+    }
+
+#pragma acc kernels present (b[0:N])
+    {
+      for (COUNTERTYPE i = 0; i < N; i++)
+	b[i] = i * 4;
+    }
+
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+    {
+      for (COUNTERTYPE ii = 0; ii < N; ii++)
+	c[ii] = a[ii] + b[ii];
+    }
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c
new file mode 100644
index 0000000..cab10df
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+  unsigned int *__restrict a;
+  unsigned int *__restrict b;
+  unsigned int *__restrict c;
+
+  a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+#pragma acc enter data create (a[0:N], b[0:N], c[0:N])
+
+#pragma acc kernels present (a[0:N])
+  {
+    for (COUNTERTYPE i = 0; i < N; i++)
+      a[i] = i * 2;
+  }
+
+#pragma acc parallel present (b[0:N])
+  {
+#pragma acc loop
+    for (COUNTERTYPE i = 0; i < N; i++)
+      b[i] = i * 4;
+  }
+
+#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
+  {
+    for (COUNTERTYPE ii = 0; ii < N; ii++)
+      c[ii] = a[ii] + b[ii];
+  }
+
+#pragma acc exit data copyout (a[0:N], b[0:N], c[0:N])
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+    if (c[i] != a[i] + b[i])
+      abort ();
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
new file mode 100644
index 0000000..7b52253
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
@@ -0,0 +1,38 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc data copyout (a(0:n-1))
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  !$acc data copyout (b(0:n-1))
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  !$acc data copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+  !$acc end data
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
new file mode 100644
index 0000000..af98efa
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
@@ -0,0 +1,38 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc enter data create (a(0:n-1))
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+  !$acc exit data copyout (a(0:n-1))
+
+  !$acc enter data create (b(0:n-1))
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+  !$acc exit data copyout (b(0:n-1))
+
+  !$acc enter data copyin (a(0:n-1), b(0:n-1)) create (c(0:n-1))
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+  !$acc exit data copyout (c(0:n-1))
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
new file mode 100644
index 0000000..bb6f8dc
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
@@ -0,0 +1,36 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc enter data create (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  !$acc exit data copyout (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
new file mode 100644
index 0000000..cab1f2c
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
@@ -0,0 +1,36 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc enter data create (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+
+  !$acc update device (b(0:n-1))
+
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  !$acc exit data copyout (a(0:n-1), c(0:n-1))
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
new file mode 100644
index 0000000..f26671d
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
@@ -0,0 +1,36 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc data copyout (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels present (b(0:n-1))
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  !$acc end data
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95 libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95
new file mode 100644
index 0000000..2322152
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95
@@ -0,0 +1,37 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter         :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer                    :: i, ii
+
+  !$acc enter data create (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  !$acc kernels present (a(0:n-1))
+  do i = 0, n - 1
+     a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc parallel present (b(0:n-1))
+  !$acc loop
+  do i = 0, n -1
+     b(i) = i * 4
+  end do
+  !$acc end parallel
+
+  !$acc kernels present (a(0:n-1), b(0:n-1), c(0:n-1))
+  do ii = 0, n - 1
+     c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  !$acc exit data copyout (a(0:n-1), b(0:n-1), c(0:n-1))
+
+  do i = 0, n - 1
+     if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias)
  2015-04-21 19:40       ` Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias) Thomas Schwinge
@ 2015-04-22  7:36         ` Richard Biener
  2015-06-04 16:50           ` Expand oacc kernels after pass_fre Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2015-04-22  7:36 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Tom de Vries, Jakub Jelinek

[-- Attachment #1: Type: TEXT/PLAIN, Size: 25684 bytes --]

On Tue, 21 Apr 2015, Thomas Schwinge wrote:

> Hi!
> 
> On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> > On 24-11-14 11:56, Tom de Vries wrote:
> > > On 15-11-14 18:19, Tom de Vries wrote:
> > >> On 15-11-14 13:14, Tom de Vries wrote:
> > >>> I'm submitting a patch series with initial support for the oacc kernels
> > >>> directive.
> > >>>
> > >>> The patch series uses pass_parallelize_loops to implement parallelization of
> > >>> loops in the oacc kernels region.
> > >>>
> > >>> The patch series consists of these 8 patches:
> > >>> ...
> > >>>      1  Expand oacc kernels after pass_build_ealias
> > >>>      2  Add pass_oacc_kernels
> > >>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > >>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > >>>      5  Add pass_loop_im to pass_oacc_kernels
> > >>>      6  Add pass_ccp to pass_oacc_kernels
> > >>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > >>>      8  Do simple omp lowering for no address taken var
> > >>> ...
> > >>
> > >> This patch moves omp expansion of the oacc kernels directive to after
> > >> pass_build_ealias.
> > >>
> > >> The rationale is that in order to use pass_parallelize_loops for analysis and
> > >> transformation of an oacc kernels region, we postpone omp expansion of that
> > >> region until the earliest point in the pass list where enough information is
> > >> availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.
> > >>
> > >> The patch postpones expansion in expand_omp, and ensures expansion by adding
> > >> pass_expand_omp_ssa:
> > >> - after pass_build_ealias, and
> > >> - after pass_all_early_optimizations for the case we're not optimizing.
> > >>
> > >> In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
> > >> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
> > >> lowered omp code, to handle it conservatively.
> > >>
> > >> The patch contains changes in expand_omp_target to deal with ssa-code, similar
> > >> to what is already present in expand_omp_taskreg.
> > >>
> > >> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
> > >> static for oacc kernels. It does this to get some references to .omp_data_sizes
> > >> and .omp_data_kinds in the ssa code.  Without these references, the definitions
> > >> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
> > >> enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
> > >> kludge for this purpose ].
> > >>
> > >> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
> > >> original function of which the definition has been removed (as in moved to the
> > >> split off function). TODO_remove_unused_locals takes care of some of them, but
> > >> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
> > >> dangling SSA_NAMEs and releases them.
> > >>
> > >
> > > Reposting with small update: I've replaced the use of the rather generic
> > > gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.
> > >
> > > Bootstrapped and reg-tested in the same way as before.
> > >
> > 
> > I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
> > 
> > This allows fre to unify references to the same omp variable before entering 
> > pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
> > 
> > F.i. this reduction fragment:
> > ...
> >    # VUSE <.MEM_8>
> >    # PT = { D.2282 }
> >    _67 = .omp_data_i_59->sumD.2270;
> >    # VUSE <.MEM_8>
> >    _68 = *_67;
> > 
> >    _70 = _66 + _68;
> > 
> >    # VUSE <.MEM_8>
> >    # PT = { D.2282 }
> >    _69 = .omp_data_i_59->sumD.2270;
> >    # .MEM_71 = VDEF <.MEM_8>
> >    *_69 = _70;
> > ...
> > 
> > is transformed by fre into:
> > ...
> >    # VUSE <.MEM_8>
> >    # PT = { D.2282 }
> >    _67 = .omp_data_i_59->sumD.2270;
> >    # VUSE <.MEM_8>
> >    _68 = *_67;
> > 
> >    _70 = _66 + _68;
> > 
> >    # .MEM_71 = VDEF <.MEM_8>
> >    *_67 = _70;
> > ...
> > 
> > In order for pass_fre to respect the kernels region boundaries, I've added a 
> > change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively.
> > 
> > Bootstrapped and reg-tested as before.
> > 
> > OK for trunk?
> 
> Committed to gomp-4_0-branch in r222279:
> 
> commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Apr 21 19:37:19 2015 +0000
> 
>     Expand oacc kernels after pass_fre
>     
>     	gcc/
>     	* omp-low.c: Include gimple-pretty-print.h.
>     	(release_first_vuse_in_edge_dest): New function.
>     	(expand_omp_target): When not in ssa, don't split off oacc kernels
>     	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
>     	expanssion, and add GOACC_kernels_internal call.
>     	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
>     	into GOACC_kernels call.  Handle ssa-code.
>     	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
>     	properties_provided field.
>     	(pass_expand_omp::execute): Set PROP_gimple_eomp in
>     	cfun->curr_properties tentatively.
>     	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
>     	todo_flags_finish field.
>     	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
>     	execute_expand_omp.
>     	(gimple_stmt_ssa_operand_references_var_p)
>     	(gimple_stmt_omp_data_i_init_p): New function.
>     	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
>     	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
>     	pass_expand_omp_ssa after pass_all_early_optimizations.
>     	* tree-ssa-ccp.c: Include omp-low.h.
>     	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
>     	conservatively.
>     	* tree-ssa-forwprop.c: Include omp-low.h.
>     	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
>     	* tree-ssa-sccvn.c: Include omp-low.h.
>     	(visit_use): Handle .omp_data_i init conservatively.
>     	* cgraph.c (cgraph_node::release_body): Don't release offloadable
>     	functions.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  gcc/ChangeLog.gomp      |   30 +++++++
>  gcc/cgraph.c            |    9 ++
>  gcc/omp-low.c           |  214 ++++++++++++++++++++++++++++++++++++++++++++---
>  gcc/omp-low.h           |    1 +
>  gcc/passes.def          |    2 +
>  gcc/tree-ssa-ccp.c      |    6 ++
>  gcc/tree-ssa-forwprop.c |    4 +-
>  gcc/tree-ssa-sccvn.c    |    4 +-
>  8 files changed, 257 insertions(+), 13 deletions(-)
> 
> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> index 7885189..1f86160 100644
> --- gcc/ChangeLog.gomp
> +++ gcc/ChangeLog.gomp
> @@ -1,5 +1,35 @@
>  2015-04-21  Tom de Vries  <tom@codesourcery.com>
>  
> +	* omp-low.c: Include gimple-pretty-print.h.
> +	(release_first_vuse_in_edge_dest): New function.
> +	(expand_omp_target): When not in ssa, don't split off oacc kernels
> +	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
> +	expanssion, and add GOACC_kernels_internal call.
> +	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
> +	into GOACC_kernels call.  Handle ssa-code.
> +	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
> +	properties_provided field.
> +	(pass_expand_omp::execute): Set PROP_gimple_eomp in
> +	cfun->curr_properties tentatively.
> +	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
> +	todo_flags_finish field.
> +	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
> +	execute_expand_omp.
> +	(gimple_stmt_ssa_operand_references_var_p)
> +	(gimple_stmt_omp_data_i_init_p): New function.
> +	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
> +	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
> +	pass_expand_omp_ssa after pass_all_early_optimizations.
> +	* tree-ssa-ccp.c: Include omp-low.h.
> +	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
> +	conservatively.
> +	* tree-ssa-forwprop.c: Include omp-low.h.
> +	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
> +	* tree-ssa-sccvn.c: Include omp-low.h.
> +	(visit_use): Handle .omp_data_i init conservatively.
> +	* cgraph.c (cgraph_node::release_body): Don't release offloadable
> +	functions.
> +
>  	* builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
>  	(ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add
>  	DEF_ATTR_TREE_LIST.
> diff --git gcc/cgraph.c gcc/cgraph.c
> index e099856..c608d7e 100644
> --- gcc/cgraph.c
> +++ gcc/cgraph.c
> @@ -1706,6 +1706,15 @@ release_function_body (tree decl)
>  void
>  cgraph_node::release_body (bool keep_arguments)
>  {
> +  /* The omp-expansion of the oacc kernels directive is post-poned till after
> +     all_small_ipa_passes.  That means pass_ipa_free_lang_data, which tries to
> +     release the body of the offload function, is run before omp_expand_target 
> +     can process the oacc kernels directive,  and omp_expand_target would crash
> +     trying to access the body.  This snippet works around this problem.
> +     FIXME: This should probably be fixed in a different way.  */
> +  if (offloadable)
> +    return;
> +
>    ipa_transforms_to_apply.release ();
>    if (!used_as_abstract_origin && symtab->state != PARSING)
>      {
> diff --git gcc/omp-low.c gcc/omp-low.c
> index 4134f3d..16d9a5e 100644
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -108,6 +108,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "context.h"
>  #include "lto-section-names.h"
>  #include "gomp-constants.h"
> +#include "gimple-pretty-print.h"
>  
>  
>  /* Lowering of OMP parallel and workshare constructs proceeds in two
> @@ -5353,6 +5354,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
>      }
>  }
>  
> +static void
> +release_first_vuse_in_edge_dest (edge e)

All functions need a comment with documentation.

> +{
> +  gimple_stmt_iterator i;
> +  basic_block bb = e->dest;
> +
> +  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
> +    {
> +      gimple phi = gsi_stmt (i);
> +      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> +
> +      if (!virtual_operand_p (arg))
> +	continue;
> +
> +      mark_virtual_operand_for_renaming (arg);
> +      return;
> +    }
> +
> +  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
> +    {
> +      gimple stmt = gsi_stmt (i);
> +      if (gimple_vuse (stmt) == NULL_TREE)
> +	continue;
> +
> +      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
> +      return;
> +    }
> +}
> +
>  /* Expand the OpenMP parallel or task directive starting at REGION.  */
>  
>  static void
> @@ -8770,8 +8800,11 @@ expand_omp_target (struct omp_region *region)
>    gimple stmt;
>    edge e;
>    bool offloaded, data_region;
> +  bool do_emit_library_call = true;
> +  bool do_splitoff = true;
>  
>    entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
> +
>    new_bb = region->entry;
>  
>    offloaded = is_gimple_omp_offloaded (entry_stmt);
> @@ -8804,12 +8837,48 @@ expand_omp_target (struct omp_region *region)
>    /* Supported by expand_omp_taskreg, but not here.  */
>    if (child_cfun != NULL)
>      gcc_checking_assert (!child_cfun->cfg);
> -  gcc_checking_assert (!gimple_in_ssa_p (cfun));
>  
>    entry_bb = region->entry;
>    exit_bb = region->exit;
>  
> -  if (offloaded)
> +  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
> +    {
> +      if (!gimple_in_ssa_p (cfun))
> +	{
> +	  /* We need to do analysis and optimizations on the kernels region
> +	     before splitoff.  Since that's hard to do on low gimple, we
> +	     postpone the splitoff until we're in SSA.
> +	     However, we do the emit of the corresponding function call already,
> +	     in order to keep the arguments of the call alive until the
> +	     splitoff.
> +	     Since at this point the function that is called is empty, we can
> +	     model the function as BUILT_IN_GOACC_KERNELS_INTERNAL, which marks
> +	     some of it's function arguments as non-escaping, so it acts less
> +	     as an optimization barrier.  */
> +	  do_splitoff = false;
> +	  cfun->curr_properties &= ~PROP_gimple_eomp;
> +	}
> +      else
> +	{
> +	  /* Don't emit the library call.  We've already done that.  */
> +	  do_emit_library_call = false;
> +	  /* Transform BUILT_IN_GOACC_KERNELS_INTERNAL into
> +	     BUILT_IN_GOACC_KERNELS_INTERNAL.  Now that the function body will be
> +	     split off, we can no longer regard the omp_data_array reference as
> +	     non-escaping.  */
> +	  gsi = gsi_last_bb (entry_bb);
> +	  gsi_prev (&gsi);
> +	  gcall *call = as_a <gcall *> (gsi_stmt (gsi));
> +	  gcc_assert (gimple_call_builtin_p (call, BUILT_IN_GOACC_KERNELS_INTERNAL));
> +	  tree fndecl = builtin_decl_explicit (BUILT_IN_GOACC_KERNELS);
> +	  gimple_call_set_fndecl (call, fndecl);
> +	  gimple_call_set_fntype (call, TREE_TYPE (fndecl));
> +	  gimple_call_reset_alias_info (call);
> +	}
> +    }
> +
> +  if (offloaded
> +      && do_splitoff)
>      {
>        unsigned srcidx, dstidx, num;
>  
> @@ -8831,7 +8900,7 @@ expand_omp_target (struct omp_region *region)
>  	{
>  	  basic_block entry_succ_bb = single_succ (entry_bb);
>  	  gimple_stmt_iterator gsi;
> -	  tree arg;
> +	  tree arg, narg;
>  	  gimple tgtcopy_stmt = NULL;
>  	  tree sender = TREE_VEC_ELT (data_arg, 0);
>  
> @@ -8861,8 +8930,27 @@ expand_omp_target (struct omp_region *region)
>  	  gcc_assert (tgtcopy_stmt != NULL);
>  	  arg = DECL_ARGUMENTS (child_fn);
>  
> -	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
> -	  gsi_remove (&gsi, true);
> +	  if (!gimple_in_ssa_p (cfun))
> +	    {
> +	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
> +	      gsi_remove (&gsi, true);
> +	    }
> +	  else
> +	    {
> +	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
> +			  == arg);
> +
> +	      /* If we are in ssa form, we must load the value from the default
> +		 definition of the argument.  That should not be defined now,
> +		 since the argument is not used uninitialized.  */
> +	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
> +	      narg = make_ssa_name (arg, gimple_build_nop ());
> +	      set_ssa_default_def (cfun, arg, narg);
> +	      /* ?? Is setting the subcode really necessary ??  */
> +	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
> +	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
> +	      update_stmt (tgtcopy_stmt);
> +	    }
>  	}
>  
>        /* Declare local variables needed in CHILD_CFUN.  */
> @@ -8905,11 +8993,23 @@ expand_omp_target (struct omp_region *region)
>  	  stmt = gimple_build_return (NULL);
>  	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
>  	  gsi_remove (&gsi, true);
> +
> +	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
> +	     which is about to be split off.  Mark the vdef for renaming.  */
> +	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
>  	}
>  
>        /* Move the offloading region into CHILD_CFUN.  */
>  
> -      block = gimple_block (entry_stmt);
> +      if (gimple_in_ssa_p (cfun))
> +	{
> +	  init_tree_ssa (child_cfun);
> +	  init_ssa_operands (child_cfun);
> +	  child_cfun->gimple_df->in_ssa_p = true;
> +	  block = NULL_TREE;
> +	}
> +      else
> +	block = gimple_block (entry_stmt);
>  
>        new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
>        if (exit_bb)
> @@ -8969,9 +9069,18 @@ expand_omp_target (struct omp_region *region)
>  	  if (changed)
>  	    cleanup_tree_cfg ();
>  	}
> +      if (gimple_in_ssa_p (cfun))
> +	update_ssa (TODO_update_ssa);
>        pop_cfun ();
>      }
>  
> +  if (!do_emit_library_call)
> +    {
> +      if (gimple_in_ssa_p (cfun))
> +	update_ssa (TODO_update_ssa_only_virtuals);
> +      return;
> +    }
> +
>    /* Emit a library call to launch the offloading region, or do data
>       transfers.  */
>    tree t1, t2, t3, t4, device, cond, c, clauses;
> @@ -8993,7 +9102,7 @@ expand_omp_target (struct omp_region *region)
>        start_ix = BUILT_IN_GOACC_PARALLEL;
>        break;
>      case GF_OMP_TARGET_KIND_OACC_KERNELS:
> -      start_ix = BUILT_IN_GOACC_KERNELS;
> +      start_ix = BUILT_IN_GOACC_KERNELS_INTERNAL;
>        break;
>      case GF_OMP_TARGET_KIND_OACC_DATA:
>        start_ix = BUILT_IN_GOACC_DATA_START;
> @@ -9128,6 +9237,7 @@ expand_omp_target (struct omp_region *region)
>      case BUILT_IN_GOACC_DATA_START:
>      case BUILT_IN_GOACC_ENTER_EXIT_DATA:
>      case BUILT_IN_GOACC_KERNELS:
> +    case BUILT_IN_GOACC_KERNELS_INTERNAL:
>      case BUILT_IN_GOACC_PARALLEL:
>      case BUILT_IN_GOACC_UPDATE:
>        break;
> @@ -9146,6 +9256,7 @@ expand_omp_target (struct omp_region *region)
>      case BUILT_IN_GOMP_TARGET_UPDATE:
>        break;
>      case BUILT_IN_GOACC_KERNELS:
> +    case BUILT_IN_GOACC_KERNELS_INTERNAL:
>      case BUILT_IN_GOACC_PARALLEL:
>        {
>  	tree t_num_gangs, t_num_workers, t_vector_length;
> @@ -9249,6 +9360,8 @@ expand_omp_target (struct omp_region *region)
>        gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
>        gsi_remove (&gsi, true);
>      }
> +  if (gimple_in_ssa_p (cfun))
> +    update_ssa (TODO_update_ssa_only_virtuals);
>  }
>  
>  
> @@ -9503,7 +9616,7 @@ const pass_data pass_data_expand_omp =
>    OPTGROUP_NONE, /* optinfo_flags */
>    TV_NONE, /* tv_id */
>    PROP_gimple_any, /* properties_required */
> -  PROP_gimple_eomp, /* properties_provided */
> +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
>    0, /* properties_destroyed */
>    0, /* todo_flags_start */
>    0, /* todo_flags_finish */
> @@ -9517,12 +9630,14 @@ public:
>    {}
>  
>    /* opt_pass methods: */
> -  virtual unsigned int execute (function *)
> +  virtual unsigned int execute (function *fun)
>      {
>        bool gate = ((flag_cilkplus != 0 || flag_openacc != 0 || flag_openmp != 0
>  		    || flag_openmp_simd != 0)
>  		   && !seen_error ());
>  
> +      fun->curr_properties |= PROP_gimple_eomp;
> +
>        /* This pass always runs, to provide PROP_gimple_eomp.
>  	 But often, there is nothing to do.  */
>        if (!gate)
> @@ -9553,7 +9668,8 @@ const pass_data pass_data_expand_omp_ssa =
>    PROP_gimple_eomp, /* properties_provided */
>    0, /* properties_destroyed */
>    0, /* todo_flags_start */
> -  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
> +  TODO_cleanup_cfg | TODO_rebuild_alias
> +  | TODO_remove_unused_locals, /* todo_flags_finish */
>  };
>  
>  class pass_expand_omp_ssa : public gimple_opt_pass
> @@ -9568,7 +9684,48 @@ public:
>      {
>        return !(fun->curr_properties & PROP_gimple_eomp);
>      }
> -  virtual unsigned int execute (function *) { return execute_expand_omp (); }
> +  virtual unsigned int execute (function *)

Please move this out of the class body.

> +    {
> +      unsigned res = execute_expand_omp ();
> +
> +      /* After running pass_expand_omp_ssa to expand the oacc kernels
> +	 directive, we are left in the original function with anonymous
> +	 SSA_NAMEs, with a defining statement that has been deleted.  This
> +	 pass finds those SSA_NAMEs and releases them.
> +	 TODO: Either fix this elsewhere, or make the fix unnecessary.  */
> +      unsigned int i;
> +      for (i = 1; i < num_ssa_names; ++i)
> +	{
> +	  tree name = ssa_name (i);
> +	  if (name == NULL_TREE)
> +	    continue;
> +
> +	  gimple stmt = SSA_NAME_DEF_STMT (name);
> +	  bool found = false;
> +
> +	  ssa_op_iter op_iter;
> +	  def_operand_p def_p;
> +	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
> +	    {
> +	      tree def = DEF_FROM_PTR (def_p);
> +	      if (def == name)
> +		{
> +		  found = true;
> +		  break;
> +		}
> +	    }
> +
> +	  if (!found)
> +	    {
> +	      if (dump_file)
> +		fprintf (dump_file, "Released dangling ssa name %u\n", i);
> +	      release_ssa_name (name);
> +	    }
> +	}
> +
> +      return res;
> +    }
> +  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
>  
>  }; // class pass_expand_omp_ssa
>  
> @@ -13728,4 +13885,39 @@ omp_finish_file (void)
>      }
>  }
>  
> +static bool
> +gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
> +					  unsigned int nr_varnames,
> +					  unsigned int flags)

Missing comment.

> +{
> +  tree use;
> +  ssa_op_iter iter;
> +  const char *s;
> +
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
> +    {
> +      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
> +	continue;
> +      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
> +
> +      unsigned int i;
> +      for (i = 0; i < nr_varnames; ++i)
> +	if (strcmp (varnames[i], s) == 0)
> +	  return true;

Eh?  This surely is crap - you can't ever have semantics depend on
identifiers.

> +    }
> +
> +  return false;
> +}
> +
> +/* Return true if STMT is .omp_data_i init.  */
> +
> +bool
> +gimple_stmt_omp_data_i_init_p (gimple stmt)
> +{
> +  const char *varnames[] = { ".omp_data_i" };
> +  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
> +  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
> +						   SSA_OP_DEF);

So no - this isn't possible this way and I suspect it's not reliable 
anyway.

> +}
> +
>  #include "gt-omp-low.h"
> diff --git gcc/omp-low.h gcc/omp-low.h
> index 8a4052e..3d30c3b 100644
> --- gcc/omp-low.h
> +++ gcc/omp-low.h
> @@ -28,6 +28,7 @@ extern void free_omp_regions (void);
>  extern tree omp_reduction_init (tree, tree);
>  extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
>  extern void omp_finish_file (void);
> +extern bool gimple_stmt_omp_data_i_init_p (gimple);
>  
>  extern GTY(()) vec<tree, va_gc> *offload_funcs;
>  extern GTY(()) vec<tree, va_gc> *offload_vars;
> diff --git gcc/passes.def gcc/passes.def
> index 2bc5dcd..db0dd18 100644
> --- gcc/passes.def
> +++ gcc/passes.def
> @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
>  	     execute TODO_rebuild_alias at this point.  */
>  	  NEXT_PASS (pass_build_ealias);
>  	  NEXT_PASS (pass_fre);
> +	  NEXT_PASS (pass_expand_omp_ssa);
>  	  NEXT_PASS (pass_merge_phi);
>  	  NEXT_PASS (pass_cd_dce);
>  	  NEXT_PASS (pass_early_ipa_sra);
> @@ -99,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>  	      late.  */
>  	  NEXT_PASS (pass_split_functions);
>        POP_INSERT_PASSES ()
> +      NEXT_PASS (pass_expand_omp_ssa);
>        NEXT_PASS (pass_release_ssa_names);
>        NEXT_PASS (pass_rebuild_cgraph_edges);
>        NEXT_PASS (pass_inline_parameters);
> diff --git gcc/tree-ssa-ccp.c gcc/tree-ssa-ccp.c
> index d45a3ff..46fe1c7 100644
> --- gcc/tree-ssa-ccp.c
> +++ gcc/tree-ssa-ccp.c
> @@ -172,6 +172,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "wide-int-print.h"
>  #include "builtins.h"
>  #include "tree-chkp.h"
> +#include "omp-low.h"
>  
>  
>  /* Possible lattice values.  */
> @@ -796,6 +797,9 @@ surely_varying_stmt_p (gimple stmt)
>        && gimple_code (stmt) != GIMPLE_CALL)
>      return true;
>  
> +  if (gimple_stmt_omp_data_i_init_p (stmt))
> +    return true;
> +

No.

>    return false;
>  }
>  
> @@ -2329,6 +2333,8 @@ ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
>    switch (gimple_code (stmt))
>      {
>        case GIMPLE_ASSIGN:
> +	if (gimple_stmt_omp_data_i_init_p (stmt))
> +	  break;
>          /* If the statement is an assignment that produces a single
>             output value, evaluate its RHS to see if the lattice value of
>             its output has changed.  */
> diff --git gcc/tree-ssa-forwprop.c gcc/tree-ssa-forwprop.c
> index d8db20a..554a5a5 100644
> --- gcc/tree-ssa-forwprop.c
> +++ gcc/tree-ssa-forwprop.c
> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-cfgcleanup.h"
>  #include "tree-into-ssa.h"
>  #include "cfganal.h"
> +#include "omp-low.h"
>  
>  /* This pass propagates the RHS of assignment statements into use
>     sites of the LHS of the assignment.  It's basically a specialized
> @@ -2155,7 +2156,8 @@ pass_forwprop::execute (function *fun)
>  	  tree lhs, rhs;
>  	  enum tree_code code;
>  
> -	  if (!is_gimple_assign (stmt))
> +	  if (!is_gimple_assign (stmt)
> +	      || gimple_stmt_omp_data_i_init_p (stmt))

No.

>  	    {
>  	      gsi_next (&gsi);
>  	      continue;
> diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
> index e417a15..449a615 100644
> --- gcc/tree-ssa-sccvn.c
> +++ gcc/tree-ssa-sccvn.c
> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-ref.h"
>  #include "plugin-api.h"
>  #include "cgraph.h"
> +#include "omp-low.h"
>  
>  /* This algorithm is based on the SCC algorithm presented by Keith
>     Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
> @@ -3542,7 +3543,8 @@ visit_use (tree use)
>      {
>        if (gimple_code (stmt) == GIMPLE_PHI)
>  	changed = visit_phi (stmt);
> -      else if (gimple_has_volatile_ops (stmt))
> +      else if (gimple_has_volatile_ops (stmt)
> +	       || gimple_stmt_omp_data_i_init_p (stmt))

No.

What is the intent of these changes?

Richard.


>  	changed = defs_to_varying (stmt);
>        else if (is_gimple_assign (stmt))
>  	{
> 
> 
> Grüße,
>  Thomas
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2015-04-21 19:49     ` Thomas Schwinge
@ 2015-04-22  7:39       ` Richard Biener
  2015-06-03  9:22         ` Tom de Vries
  2015-06-03 10:05         ` Tom de Vries
  0 siblings, 2 replies; 71+ messages in thread
From: Richard Biener @ 2015-04-22  7:39 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Tom de Vries, Jakub Jelinek

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11962 bytes --]

On Tue, 21 Apr 2015, Thomas Schwinge wrote:

> Hi!
> 
> On Tue, 25 Nov 2014 12:27:34 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> > On 15-11-14 18:21, Tom de Vries wrote:
> > > On 15-11-14 13:14, Tom de Vries wrote:
> > >> Hi,
> > >>
> > >> I'm submitting a patch series with initial support for the oacc kernels
> > >> directive.
> > >>
> > >> The patch series uses pass_parallelize_loops to implement parallelization of
> > >> loops in the oacc kernels region.
> > >>
> > >> The patch series consists of these 8 patches:
> > >> ...
> > >>      1  Expand oacc kernels after pass_build_ealias
> > >>      2  Add pass_oacc_kernels
> > >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > >>      5  Add pass_loop_im to pass_oacc_kernels
> > >>      6  Add pass_ccp to pass_oacc_kernels
> > >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > >>      8  Do simple omp lowering for no address taken var
> > >> ...
> > >
> > > This patch adds a pass_ch_oacc_kernels to the pass group pass_oacc_kernels.
> > >
> > > The idea is that pass_parallelize_loops only deals with loops for which the
> > > header has been copied, so the easiest way to meet that requirement when running
> > > pass_parallelize_loops in group pass_oacc_kernels, is to run pass_ch as a part
> > > of pass_oacc_kernels.
> > >
> > > We define a seperate pass pass_ch_oacc_kernels, to leave all loops that aren't
> > > part of a kernels region alone.
> > >
> > 
> > Updated for moving pass_oacc_kernels down past pass_fre in the pass list.
> > 
> > Bootstrapped and reg-tested as before.
> > 
> > OK for trunk?
> 
> Committed to gomp-4_0-branch in r222281:
> 
> commit 58c33a7965c379b55b549d50e3b79b2252bcc876
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Apr 21 19:48:16 2015 +0000
> 
>     Add pass_ch_oacc_kernels to pass_oacc_kernels
>     
>     	gcc/
>     	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
>     	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
>     	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
>     	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
>     	* tree-ssa-loop-ch.c: Include omp-low.h.
>     	(pass_ch_execute): Declare.
>     	(pass_ch::execute): Factor out ...
>     	(pass_ch_execute): ... this new function.  If handling oacc kernels,
>     	skip loops that are not in oacc kernels region.
>     	(pass_ch_oacc_kernels::execute):
>     	(pass_data_ch_oacc_kernels): New pass_data.
>     	(class pass_ch_oacc_kernels): New pass.
>     	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
>     	function.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222281 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  gcc/ChangeLog.gomp     |   15 ++++++++
>  gcc/omp-low.c          |   91 ++++++++++++++++++++++++++++++++++++++++++++++++
>  gcc/omp-low.h          |    2 ++
>  gcc/passes.def         |    1 +
>  gcc/tree-pass.h        |    1 +
>  gcc/tree-ssa-loop-ch.c |   59 +++++++++++++++++++++++++++++--
>  6 files changed, 167 insertions(+), 2 deletions(-)
> 
> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> index 8a53ad8..d00c5e0 100644
> --- gcc/ChangeLog.gomp
> +++ gcc/ChangeLog.gomp
> @@ -1,5 +1,20 @@
>  2015-04-21  Tom de Vries  <tom@codesourcery.com>
>  
> +	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
> +	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
> +	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
> +	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
> +	* tree-ssa-loop-ch.c: Include omp-low.h.
> +	(pass_ch_execute): Declare.
> +	(pass_ch::execute): Factor out ...
> +	(pass_ch_execute): ... this new function.  If handling oacc kernels,
> +	skip loops that are not in oacc kernels region.
> +	(pass_ch_oacc_kernels::execute):
> +	(pass_data_ch_oacc_kernels): New pass_data.
> +	(class pass_ch_oacc_kernels): New pass.
> +	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
> +	function.
> +
>  	* passes.def: Add pass group pass_oacc_kernels.
>  	* tree-pass.h (make_pass_oacc_kernels): Declare.
>  	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
> diff --git gcc/omp-low.c gcc/omp-low.c
> index 16d9a5e..1b03ae6 100644
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -13920,4 +13920,95 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
>  						   SSA_OP_DEF);
>  }
>  
> +/* Return true if LOOP is inside a kernels region.  */
> +
> +bool
> +loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry,
> +			       basic_block *region_exit)

Ehm.  So why not simply add a flag to struct loop instead and set it
during OMP region parsing/lowering?

It's also very odd that you disable transforms on OMP regions but at
the same time do all the OMP processing _after_ those transforms.
Something feels backward here.

Richard.


> +{
> +  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
> +  bitmap region_bitmap = BITMAP_GGC_ALLOC ();
> +  bitmap_clear (region_bitmap);
> +
> +  if (region_entry != NULL)
> +    *region_entry = NULL;
> +  if (region_exit != NULL)
> +    *region_exit = NULL;
> +
> +  basic_block bb;
> +  gimple last;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      if (bitmap_bit_p (region_bitmap, bb->index))
> +	continue;
> +
> +      last = last_stmt (bb);
> +      if (!last)
> +	continue;
> +
> +      if (gimple_code (last) != GIMPLE_OMP_TARGET
> +	  || (gimple_omp_target_kind (last) != GF_OMP_TARGET_KIND_OACC_KERNELS))
> +	continue;
> +
> +      bitmap_clear (excludes_bitmap);
> +      bitmap_set_bit (excludes_bitmap, bb->index);
> +
> +      vec<basic_block> dominated
> +	= get_all_dominated_blocks (CDI_DOMINATORS, bb);
> +
> +      unsigned di;
> +      basic_block dom;
> +
> +      basic_block end_region = NULL;
> +      FOR_EACH_VEC_ELT (dominated, di, dom)
> +	{
> +	  if (dom == bb)
> +	    continue;
> +
> +	  last = last_stmt (dom);
> +	  if (!last)
> +	    continue;
> +
> +	  if (gimple_code (last) != GIMPLE_OMP_RETURN)
> +	    continue;
> +
> +	  if (end_region == NULL
> +	      || dominated_by_p (CDI_DOMINATORS, end_region, dom))
> +	    end_region = dom;
> +	}
> +
> +      if (end_region == NULL)
> +	{
> +	  gimple kernels = last_stmt (bb);
> +	  fatal_error (gimple_location (kernels),
> +		       "End of kernel region unreachable");
> +	}
> +
> +      vec<basic_block> excludes
> +	= get_all_dominated_blocks (CDI_DOMINATORS, end_region);
> +
> +      unsigned di2;
> +      basic_block exclude;
> +
> +      FOR_EACH_VEC_ELT (excludes, di2, exclude)
> +	if (exclude != end_region)
> +	  bitmap_set_bit (excludes_bitmap, exclude->index);
> +
> +      FOR_EACH_VEC_ELT (dominated, di, dom)
> +	if (!bitmap_bit_p (excludes_bitmap, dom->index))
> +	  bitmap_set_bit (region_bitmap, dom->index);
> +
> +      if (bitmap_bit_p (region_bitmap, loop->header->index))
> +	{
> +	  if (region_entry != NULL)
> +	    *region_entry = bb;
> +	  if (region_exit != NULL)
> +	    *region_exit = end_region;
> +	  return true;
> +	}
> +    }
> +
> +  return false;
> +}
> +
>  #include "gt-omp-low.h"
> diff --git gcc/omp-low.h gcc/omp-low.h
> index 3d30c3b..ae63c9f 100644
> --- gcc/omp-low.h
> +++ gcc/omp-low.h
> @@ -29,6 +29,8 @@ extern tree omp_reduction_init (tree, tree);
>  extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
>  extern void omp_finish_file (void);
>  extern bool gimple_stmt_omp_data_i_init_p (gimple);
> +extern bool loop_in_oacc_kernels_region_p (struct loop *, basic_block *,
> +					   basic_block *);
>  
>  extern GTY(()) vec<tree, va_gc> *offload_funcs;
>  extern GTY(()) vec<tree, va_gc> *offload_vars;
> diff --git gcc/passes.def gcc/passes.def
> index 854c5b8..5cdbc87 100644
> --- gcc/passes.def
> +++ gcc/passes.def
> @@ -90,6 +90,7 @@ along with GCC; see the file COPYING3.  If not see
>  	     function.  */
>  	  NEXT_PASS (pass_oacc_kernels);
>  	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> +	      NEXT_PASS (pass_ch_oacc_kernels);
>  	      NEXT_PASS (pass_expand_omp_ssa);
>  	  POP_INSERT_PASSES ()
>  	  NEXT_PASS (pass_merge_phi);
> diff --git gcc/tree-pass.h gcc/tree-pass.h
> index 35778f2..321229a 100644
> --- gcc/tree-pass.h
> +++ gcc/tree-pass.h
> @@ -379,6 +379,7 @@ extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_ch_oacc_kernels (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> diff --git gcc/tree-ssa-loop-ch.c gcc/tree-ssa-loop-ch.c
> index d759de7..5f24bcb 100644
> --- gcc/tree-ssa-loop-ch.c
> +++ gcc/tree-ssa-loop-ch.c
> @@ -54,12 +54,15 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-inline.h"
>  #include "flags.h"
>  #include "tree-ssa-threadedge.h"
> +#include "omp-low.h"
>  
>  /* Duplicates headers of loops if they are small enough, so that the statements
>     in the loop body are always executed when the loop is entered.  This
>     increases effectiveness of code motion optimizations, and reduces the need
>     for loop preconditioning.  */
>  
> +static unsigned int pass_ch_execute (function *, bool);
> +
>  /* Check whether we should duplicate HEADER of LOOP.  At most *LIMIT
>     instructions should be duplicated, limit is decreased by the actual
>     amount.  */
> @@ -178,6 +181,14 @@ public:
>  unsigned int
>  pass_ch::execute (function *fun)
>  {
> +  return pass_ch_execute (fun, false);
> +}
> +
> +} // anon namespace
> +
> +static unsigned int
> +pass_ch_execute (function *fun, bool oacc_kernels_p)
> +{
>    struct loop *loop;
>    basic_block header;
>    edge exit, entry;
> @@ -211,6 +222,10 @@ pass_ch::execute (function *fun)
>        if (do_while_loop_p (loop))
>  	continue;
>  
> +      if (oacc_kernels_p
> +	  && !loop_in_oacc_kernels_region_p (loop, NULL, NULL))
> +	continue;
> +
>        /* Iterate the header copying up to limit; this takes care of the cases
>  	 like while (a && b) {...}, where we want to have both of the conditions
>  	 copied.  TODO -- handle while (a || b) - like cases, by not requiring
> @@ -301,10 +316,50 @@ pass_ch::execute (function *fun)
>    return 0;
>  }
>  
> -} // anon namespace
> -
>  gimple_opt_pass *
>  make_pass_ch (gcc::context *ctxt)
>  {
>    return new pass_ch (ctxt);
>  }
> +
> +namespace {
> +
> +const pass_data pass_data_ch_oacc_kernels =
> +{
> +  GIMPLE_PASS, /* type */
> +  "ch_oacc_kernels", /* name */
> +  OPTGROUP_LOOP, /* optinfo_flags */
> +  TV_TREE_CH, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_cleanup_cfg, /* todo_flags_finish */
> +};
> +
> + class pass_ch_oacc_kernels : public gimple_opt_pass
> +{
> +public:
> +  pass_ch_oacc_kernels (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_ch_oacc_kernels, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return true; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_ch_oacc_kernels
> +
> +unsigned int
> +pass_ch_oacc_kernels::execute (function *fun)
> +{
> +  return pass_ch_execute (fun, true);
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_ch_oacc_kernels (gcc::context *ctxt)
> +{
> +  return new pass_ch_oacc_kernels (ctxt);
> +}
> 
> 
> Grüße,
>  Thomas
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
  2015-04-21 19:52     ` Thomas Schwinge
@ 2015-04-22  7:40       ` Richard Biener
  2015-06-02 13:52         ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2015-04-22  7:40 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Tom de Vries, Jakub Jelinek

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4297 bytes --]

On Tue, 21 Apr 2015, Thomas Schwinge wrote:

> Hi!
> 
> On Tue, 25 Nov 2014 12:29:28 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> > On 15-11-14 18:21, Tom de Vries wrote:
> > > On 15-11-14 13:14, Tom de Vries wrote:
> > >> I'm submitting a patch series with initial support for the oacc kernels
> > >> directive.
> > >>
> > >> The patch series uses pass_parallelize_loops to implement parallelization of
> > >> loops in the oacc kernels region.
> > >>
> > >> The patch series consists of these 8 patches:
> > >> ...
> > >>      1  Expand oacc kernels after pass_build_ealias
> > >>      2  Add pass_oacc_kernels
> > >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > >>      5  Add pass_loop_im to pass_oacc_kernels
> > >>      6  Add pass_ccp to pass_oacc_kernels
> > >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > >>      8  Do simple omp lowering for no address taken var
> > >> ...
> > >
> > > This patch adds pass_tree_loop_init and pass_tree_loop_init_done to
> > > pass_oacc_kernels.
> > >
> > > Pass_parallelize_loops is run between these passes in the pass group
> > > pass_tree_loop, since it requires loop information.  We do the same for
> > > pass_oacc_kernels.
> > >
> > 
> > Updated for moving pass_oacc_kernels down past pass_fre in the pass list.
> > 
> > Bootstrapped and reg-tested as before.
> > 
> > OK for trunk?

Both passes should be basically no-ops.  Why not call 
loop_optimizer_init/finalize from expand_omp_ssa instead?

> Committed to gomp-4_0-branch in r222282:
> 
> commit cb95b4a1efcdb96c58cda986d53b20c3537c1ab7
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Apr 21 19:51:33 2015 +0000
> 
>     Add pass_tree_loop_{init,done} to pass_oacc_kernels
>     
>     	gcc/
>     	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
>     	group pass_oacc_kernels.
>     	* tree-ssa-loop.c (pass_tree_loop_init::clone)
>     	(pass_tree_loop_done::clone): New function.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222282 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  gcc/ChangeLog.gomp  |    5 +++++
>  gcc/passes.def      |    2 ++
>  gcc/tree-ssa-loop.c |    2 ++
>  3 files changed, 9 insertions(+)
> 
> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> index d00c5e0..1fb060f 100644
> --- gcc/ChangeLog.gomp
> +++ gcc/ChangeLog.gomp
> @@ -1,5 +1,10 @@
>  2015-04-21  Tom de Vries  <tom@codesourcery.com>
>  
> +	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
> +	group pass_oacc_kernels.
> +	* tree-ssa-loop.c (pass_tree_loop_init::clone)
> +	(pass_tree_loop_done::clone): New function.
> +
>  	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
>  	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
>  	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
> diff --git gcc/passes.def gcc/passes.def
> index 5cdbc87..83ae04e 100644
> --- gcc/passes.def
> +++ gcc/passes.def
> @@ -91,7 +91,9 @@ along with GCC; see the file COPYING3.  If not see
>  	  NEXT_PASS (pass_oacc_kernels);
>  	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
>  	      NEXT_PASS (pass_ch_oacc_kernels);
> +	      NEXT_PASS (pass_tree_loop_init);
>  	      NEXT_PASS (pass_expand_omp_ssa);
> +	      NEXT_PASS (pass_tree_loop_done);
>  	  POP_INSERT_PASSES ()
>  	  NEXT_PASS (pass_merge_phi);
>  	  NEXT_PASS (pass_cd_dce);
> diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
> index a041858..2a96a39 100644
> --- gcc/tree-ssa-loop.c
> +++ gcc/tree-ssa-loop.c
> @@ -272,6 +272,7 @@ public:
>  
>    /* opt_pass methods: */
>    virtual unsigned int execute (function *);
> +  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
>  
>  }; // class pass_tree_loop_init
>  
> @@ -566,6 +567,7 @@ public:
>  
>    /* opt_pass methods: */
>    virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }
> +  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
>  
>  }; // class pass_tree_loop_done
>  
> 
> 
> Grüße,
>  Thomas
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels
  2015-04-21 20:01     ` [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels Thomas Schwinge
@ 2015-04-22  7:42       ` Richard Biener
  2015-06-02 13:04         ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2015-04-22  7:42 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Tom de Vries, Jakub Jelinek

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4329 bytes --]

On Tue, 21 Apr 2015, Thomas Schwinge wrote:

> Hi!
> 
> On Tue, 25 Nov 2014 12:38:55 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> > On 15-11-14 18:22, Tom de Vries wrote:
> > > On 15-11-14 13:14, Tom de Vries wrote:
> > >> I'm submitting a patch series with initial support for the oacc kernels
> > >> directive.
> > >>
> > >> The patch series uses pass_parallelize_loops to implement parallelization of
> > >> loops in the oacc kernels region.
> > >>
> > >> The patch series consists of these 8 patches:
> > >> ...
> > >>      1  Expand oacc kernels after pass_build_ealias
> > >>      2  Add pass_oacc_kernels
> > >>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > >>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > >>      5  Add pass_loop_im to pass_oacc_kernels
> > >>      6  Add pass_ccp to pass_oacc_kernels
> > >>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > >>      8  Do simple omp lowering for no address taken var
> > >> ...
> > >
> > > This patch adds pass_loop_ccp to pass group pass_oacc_kernels.
> > >
> > > We need this pass to simplify the loop body, and allow pass_parloops to detect
> > > that loop iterations are independent.
> > >
> > 
> > As suggested here ( https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html ) 
> > I've replaced the pass_ccp with pass_copyprop, which performs trivial constant 
> > propagation in addition to copy propagation.
> > 
> > Bootstrapped and reg-tested as before.
> > 
> > OK for trunk?

I've recently wondered why we do copy propagation after LIM and I don't
remember.  Can you remind me?  Can you add testcases that fail before
this kind of patches and pass afterwards?

Richard.

> Committed to gomp-4_0-branch in r222284:
> 
> commit 1c2529b64620811cbff4a50374af797ee52ef5f8
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Apr 21 19:58:54 2015 +0000
> 
>     Add pass_copy_prop in pass_oacc_kernels
>     
>     	gcc/
>     	* passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
>     	* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
>     	conservatively.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222284 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  gcc/ChangeLog.gomp  |    4 ++++
>  gcc/passes.def      |    1 +
>  gcc/tree-ssa-copy.c |    4 ++++
>  3 files changed, 9 insertions(+)
> 
> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> index 98e33ad..0be9191 100644
> --- gcc/ChangeLog.gomp
> +++ gcc/ChangeLog.gomp
> @@ -1,5 +1,9 @@
>  2015-04-21  Tom de Vries  <tom@codesourcery.com>
>  
> +	* passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
> +	* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
> +	conservatively.
> +
>  	* passes.def: Add pass_lim in pass group pass_ch_oacc_kernels.
>  
>  	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
> diff --git gcc/passes.def gcc/passes.def
> index e6c9287..e6f1c33 100644
> --- gcc/passes.def
> +++ gcc/passes.def
> @@ -93,6 +93,7 @@ along with GCC; see the file COPYING3.  If not see
>  	      NEXT_PASS (pass_ch_oacc_kernels);
>  	      NEXT_PASS (pass_tree_loop_init);
>  	      NEXT_PASS (pass_lim);
> +	      NEXT_PASS (pass_copy_prop);
>  	      NEXT_PASS (pass_expand_omp_ssa);
>  	      NEXT_PASS (pass_tree_loop_done);
>  	  POP_INSERT_PASSES ()
> diff --git gcc/tree-ssa-copy.c gcc/tree-ssa-copy.c
> index 5ae8e6c..6f35f99 100644
> --- gcc/tree-ssa-copy.c
> +++ gcc/tree-ssa-copy.c
> @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-scalar-evolution.h"
>  #include "tree-ssa-dom.h"
>  #include "tree-ssa-loop-niter.h"
> +#include "omp-low.h"
>  
>  
>  /* This file implements the copy propagation pass and provides a
> @@ -116,6 +117,9 @@ stmt_may_generate_copy (gimple stmt)
>    if (gimple_has_volatile_ops (stmt))
>      return false;
>  
> +  if (gimple_stmt_omp_data_i_init_p (stmt))
> +    return false;
> +
>    /* Statements with loads and/or stores will never generate a useful copy.  */
>    if (gimple_vuse (stmt))
>      return false;
> 
> 
> Grüße,
>  Thomas
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels
  2015-04-22  7:42       ` Richard Biener
@ 2015-06-02 13:04         ` Tom de Vries
  0 siblings, 0 replies; 71+ messages in thread
From: Tom de Vries @ 2015-06-02 13:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2320 bytes --]

On 22-04-15 09:42, Richard Biener wrote:
>>>> This patch adds pass_loop_ccp to pass group pass_oacc_kernels.
>>>> > > >
>>>> > > >We need this pass to simplify the loop body, and allow pass_parloops to detect
>>>> > > >that loop iterations are independent.
>>>> > > >
>>> > >
>>> > >As suggested here (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html  )
>>> > >I've replaced the pass_ccp with pass_copyprop, which performs trivial constant
>>> > >propagation in addition to copy propagation.
>>> > >
>>> > >Bootstrapped and reg-tested as before.
>>> > >
>>> > >OK for trunk?
> I've recently wondered why we do copy propagation after LIM and I don't
> remember.  Can you remind me?  Can you add testcases that fail before
> this kind of patches and pass afterwards?

For attached test-case, we manage to parallelize with pass_copy_prop (but then 
run into an ICE):
...
PASS: c-c++-common/goacc/kernels-loop-reduction.c scan-tree-dump-not 
parloops_oacc_kernels "FAILED:"
PASS: c-c++-common/goacc/kernels-loop-reduction.c scan-tree-dump-times 
parloops_oacc_kernels "SUCCESS: may be parallelized" 1
FAIL: c-c++-common/goacc/kernels-loop-reduction.c (internal compiler error)
FAIL: c-c++-common/goacc/kernels-loop-reduction.c (test for excess errors)
...

Without pass_copy_prop we don't manage to parallelize:
...
FAIL: c-c++-common/goacc/kernels-loop-reduction.c scan-tree-dump-not 
parloops_oacc_kernels "FAILED:"
FAIL: c-c++-common/goacc/kernels-loop-reduction.c scan-tree-dump-times 
parloops_oacc_kernels "SUCCESS: may be parallelized" 1
PASS: c-c++-common/goacc/kernels-loop-reduction.c (test for excess errors)
...

In more detail, before pass_copy_prop, we have:
...
   <bb 7>:
   # D__lsm.14_3 = PHI <D__lsm.14_9(15), D__lsm.14_21(6)>
   ...
   sum.3_39 = D__lsm.14_3;
   sum.4_40 = _37 + sum.3_39;
   D__lsm.14_9 = sum.4_40;
   ...
   if (ii_43 <= 524287)
     goto <bb 15>;
   else
     goto <bb 8>;

   <bb 15>:
   goto <bb 7>;
...

And after pass_copy_prop, we have:
...
   <bb 7>:
   # D__lsm.14_3 = PHI <sum.4_40(8), D__lsm.14_21(6)>
   ...
   sum.4_40 = D__lsm.14_3 + _37;
   ...
   if (ii_43 <= 524287)
     goto <bb 8>;
   else
     goto <bb 9>;

   <bb 8>:
   goto <bb 7>;
...


The testcase is not committed yet, because reductions are not handled yet (which 
explains the ICE).

Thanks,
- Tom

[-- Attachment #2: kernels-loop-reduction.c --]
[-- Type: text/x-csrc, Size: 1027 bytes --]

/* { dg-additional-options "-O2" } */
/* { dg-additional-options "-ftree-parallelize-loops=32" } */
/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */

#include <stdlib.h>

#define N (1024 * 512)
#define COUNTERTYPE unsigned int

int
main (void)
{
  unsigned int *__restrict a;
  unsigned int sum = 0;
  unsigned int sum2 = 0;

  a = (unsigned int *)malloc (N * sizeof (unsigned int));

  for (COUNTERTYPE i = 0; i < N; i++)
    a[i] = i * 2;

#pragma acc kernels copy (sum) copyin (a[0:N])
  {
    for (COUNTERTYPE ii = 0; ii < N; ii++)
      sum += a[ii];
  }

  for (COUNTERTYPE i = 0; i < N; i++)
    sum2 += a[i];

  if (sum != sum2)
      abort ();

  free (a);

  return 0;
}

/* Check that only one loop is analyzed, and that it can be parallelized.  */
/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */

/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
  2015-04-22  7:40       ` Richard Biener
@ 2015-06-02 13:52         ` Tom de Vries
  2015-06-02 13:58           ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-06-02 13:52 UTC (permalink / raw)
  To: Richard Biener, Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek

On 22-04-15 09:40, Richard Biener wrote:
> On Tue, 21 Apr 2015, Thomas Schwinge wrote:
>
>> Hi!
>>
>> On Tue, 25 Nov 2014 12:29:28 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 15-11-14 18:21, Tom de Vries wrote:
>>>> On 15-11-14 13:14, Tom de Vries wrote:
>>>>> I'm submitting a patch series with initial support for the oacc kernels
>>>>> directive.
>>>>>
>>>>> The patch series uses pass_parallelize_loops to implement parallelization of
>>>>> loops in the oacc kernels region.
>>>>>
>>>>> The patch series consists of these 8 patches:
>>>>> ...
>>>>>       1  Expand oacc kernels after pass_build_ealias
>>>>>       2  Add pass_oacc_kernels
>>>>>       3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>>>>       4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>>>>       5  Add pass_loop_im to pass_oacc_kernels
>>>>>       6  Add pass_ccp to pass_oacc_kernels
>>>>>       7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>>>>       8  Do simple omp lowering for no address taken var
>>>>> ...
>>>>
>>>> This patch adds pass_tree_loop_init and pass_tree_loop_init_done to
>>>> pass_oacc_kernels.
>>>>
>>>> Pass_parallelize_loops is run between these passes in the pass group
>>>> pass_tree_loop, since it requires loop information.  We do the same for
>>>> pass_oacc_kernels.
>>>>
>>>
>>> Updated for moving pass_oacc_kernels down past pass_fre in the pass list.
>>>
>>> Bootstrapped and reg-tested as before.
>>>
>>> OK for trunk?
>
> Both passes should be basically no-ops.  Why not call
> loop_optimizer_init/finalize from expand_omp_ssa instead?
>

The current pass list is:
...
           NEXT_PASS (pass_build_ealias);
           NEXT_PASS (pass_fre);
           /* Pass group that runs when there are oacc kernels in the
              function.  */
           NEXT_PASS (pass_oacc_kernels);
           PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
               NEXT_PASS (pass_ch_oacc_kernels);
               NEXT_PASS (pass_fre);
               NEXT_PASS (pass_tree_loop_init);
               NEXT_PASS (pass_lim);
               NEXT_PASS (pass_copy_prop);
               NEXT_PASS (pass_scev_cprop);
               NEXT_PASS (pass_parallelize_loops_oacc_kernels);
               NEXT_PASS (pass_expand_omp_ssa);
               NEXT_PASS (pass_tree_loop_done);
           POP_INSERT_PASSES ()
           NEXT_PASS (pass_merge_phi);
           NEXT_PASS (pass_dse);
...

Do you want to call loop_optimizer_init from pass_lim and 
loop_optimizer_finalize from pass_expand_omp_ssa, or are things ok as they are?

Thanks,
- Tom

>> Committed to gomp-4_0-branch in r222282:
>>
>> commit cb95b4a1efcdb96c58cda986d53b20c3537c1ab7
>> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Tue Apr 21 19:51:33 2015 +0000
>>
>>      Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>
>>      	gcc/
>>      	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
>>      	group pass_oacc_kernels.
>>      	* tree-ssa-loop.c (pass_tree_loop_init::clone)
>>      	(pass_tree_loop_done::clone): New function.
>>
>>      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222282 138bc75d-0d04-0410-961f-82ee72b054a4
>> ---
>>   gcc/ChangeLog.gomp  |    5 +++++
>>   gcc/passes.def      |    2 ++
>>   gcc/tree-ssa-loop.c |    2 ++
>>   3 files changed, 9 insertions(+)
>>
>> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
>> index d00c5e0..1fb060f 100644
>> --- gcc/ChangeLog.gomp
>> +++ gcc/ChangeLog.gomp
>> @@ -1,5 +1,10 @@
>>   2015-04-21  Tom de Vries  <tom@codesourcery.com>
>>
>> +	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
>> +	group pass_oacc_kernels.
>> +	* tree-ssa-loop.c (pass_tree_loop_init::clone)
>> +	(pass_tree_loop_done::clone): New function.
>> +
>>   	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
>>   	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
>>   	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
>> diff --git gcc/passes.def gcc/passes.def
>> index 5cdbc87..83ae04e 100644
>> --- gcc/passes.def
>> +++ gcc/passes.def
>> @@ -91,7 +91,9 @@ along with GCC; see the file COPYING3.  If not see
>>   	  NEXT_PASS (pass_oacc_kernels);
>>   	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
>>   	      NEXT_PASS (pass_ch_oacc_kernels);
>> +	      NEXT_PASS (pass_tree_loop_init);
>>   	      NEXT_PASS (pass_expand_omp_ssa);
>> +	      NEXT_PASS (pass_tree_loop_done);
>>   	  POP_INSERT_PASSES ()
>>   	  NEXT_PASS (pass_merge_phi);
>>   	  NEXT_PASS (pass_cd_dce);
>> diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
>> index a041858..2a96a39 100644
>> --- gcc/tree-ssa-loop.c
>> +++ gcc/tree-ssa-loop.c
>> @@ -272,6 +272,7 @@ public:
>>
>>     /* opt_pass methods: */
>>     virtual unsigned int execute (function *);
>> +  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
>>
>>   }; // class pass_tree_loop_init
>>
>> @@ -566,6 +567,7 @@ public:
>>
>>     /* opt_pass methods: */
>>     virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }
>> +  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
>>
>>   }; // class pass_tree_loop_done
>>
>>
>>
>> Grüße,
>>   Thomas
>>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
  2015-06-02 13:52         ` Tom de Vries
@ 2015-06-02 13:58           ` Richard Biener
  2015-06-02 15:40             ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2015-06-02 13:58 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6547 bytes --]

On Tue, 2 Jun 2015, Tom de Vries wrote:

> On 22-04-15 09:40, Richard Biener wrote:
> > On Tue, 21 Apr 2015, Thomas Schwinge wrote:
> > 
> > > Hi!
> > > 
> > > On Tue, 25 Nov 2014 12:29:28 +0100, Tom de Vries <Tom_deVries@mentor.com>
> > > wrote:
> > > > On 15-11-14 18:21, Tom de Vries wrote:
> > > > > On 15-11-14 13:14, Tom de Vries wrote:
> > > > > > I'm submitting a patch series with initial support for the oacc
> > > > > > kernels
> > > > > > directive.
> > > > > > 
> > > > > > The patch series uses pass_parallelize_loops to implement
> > > > > > parallelization of
> > > > > > loops in the oacc kernels region.
> > > > > > 
> > > > > > The patch series consists of these 8 patches:
> > > > > > ...
> > > > > >       1  Expand oacc kernels after pass_build_ealias
> > > > > >       2  Add pass_oacc_kernels
> > > > > >       3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > > > > >       4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > > > > >       5  Add pass_loop_im to pass_oacc_kernels
> > > > > >       6  Add pass_ccp to pass_oacc_kernels
> > > > > >       7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > > > > >       8  Do simple omp lowering for no address taken var
> > > > > > ...
> > > > > 
> > > > > This patch adds pass_tree_loop_init and pass_tree_loop_init_done to
> > > > > pass_oacc_kernels.
> > > > > 
> > > > > Pass_parallelize_loops is run between these passes in the pass group
> > > > > pass_tree_loop, since it requires loop information.  We do the same
> > > > > for
> > > > > pass_oacc_kernels.
> > > > > 
> > > > 
> > > > Updated for moving pass_oacc_kernels down past pass_fre in the pass
> > > > list.
> > > > 
> > > > Bootstrapped and reg-tested as before.
> > > > 
> > > > OK for trunk?
> > 
> > Both passes should be basically no-ops.  Why not call
> > loop_optimizer_init/finalize from expand_omp_ssa instead?
> > 
> 
> The current pass list is:
> ...
>           NEXT_PASS (pass_build_ealias);
>           NEXT_PASS (pass_fre);
>           /* Pass group that runs when there are oacc kernels in the
>              function.  */
>           NEXT_PASS (pass_oacc_kernels);
>           PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
>               NEXT_PASS (pass_ch_oacc_kernels);
>               NEXT_PASS (pass_fre);
>               NEXT_PASS (pass_tree_loop_init);
>               NEXT_PASS (pass_lim);
>               NEXT_PASS (pass_copy_prop);
>               NEXT_PASS (pass_scev_cprop);
>               NEXT_PASS (pass_parallelize_loops_oacc_kernels);
>               NEXT_PASS (pass_expand_omp_ssa);
>               NEXT_PASS (pass_tree_loop_done);
>           POP_INSERT_PASSES ()
>           NEXT_PASS (pass_merge_phi);
>           NEXT_PASS (pass_dse);
> ...
> 
> Do you want to call loop_optimizer_init from pass_lim and
> loop_optimizer_finalize from pass_expand_omp_ssa, or are things ok as they
> are?

No, Jakub probably means to call loop_optimizer_init/finalize in 
each of the passes.  Note that keeping loops initialized keeps
you in loop-closed SSA form and also preserves some more loop
properties during cfg-cleanup.  So I think things are ok as they
are.

As far as I understand at least SCEV-cprop and parloops need
loop-closed SSA form to work (LIM doesn't need anything fancy,
apart from disambiguated latches).

Btw, I wonder why you don't organize the oacc-kernel passes in
a new simple-IPA group after pass_local_optimization_passes.

Richard.

> Thanks,
> - Tom
> 
> > > Committed to gomp-4_0-branch in r222282:
> > > 
> > > commit cb95b4a1efcdb96c58cda986d53b20c3537c1ab7
> > > Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> > > Date:   Tue Apr 21 19:51:33 2015 +0000
> > > 
> > >      Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > > 
> > >      	gcc/
> > >      	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done
> > > in pass
> > >      	group pass_oacc_kernels.
> > >      	* tree-ssa-loop.c (pass_tree_loop_init::clone)
> > >      	(pass_tree_loop_done::clone): New function.
> > > 
> > >      git-svn-id:
> > > svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222282
> > > 138bc75d-0d04-0410-961f-82ee72b054a4
> > > ---
> > >   gcc/ChangeLog.gomp  |    5 +++++
> > >   gcc/passes.def      |    2 ++
> > >   gcc/tree-ssa-loop.c |    2 ++
> > >   3 files changed, 9 insertions(+)
> > > 
> > > diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> > > index d00c5e0..1fb060f 100644
> > > --- gcc/ChangeLog.gomp
> > > +++ gcc/ChangeLog.gomp
> > > @@ -1,5 +1,10 @@
> > >   2015-04-21  Tom de Vries  <tom@codesourcery.com>
> > > 
> > > +	* passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass
> > > +	group pass_oacc_kernels.
> > > +	* tree-ssa-loop.c (pass_tree_loop_init::clone)
> > > +	(pass_tree_loop_done::clone): New function.
> > > +
> > >   	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
> > >   	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
> > >   	* passes.def: Add pass_ch_oacc_kernels to pass group
> > > pass_oacc_kernels.
> > > diff --git gcc/passes.def gcc/passes.def
> > > index 5cdbc87..83ae04e 100644
> > > --- gcc/passes.def
> > > +++ gcc/passes.def
> > > @@ -91,7 +91,9 @@ along with GCC; see the file COPYING3.  If not see
> > >   	  NEXT_PASS (pass_oacc_kernels);
> > >   	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> > >   	      NEXT_PASS (pass_ch_oacc_kernels);
> > > +	      NEXT_PASS (pass_tree_loop_init);
> > >   	      NEXT_PASS (pass_expand_omp_ssa);
> > > +	      NEXT_PASS (pass_tree_loop_done);
> > >   	  POP_INSERT_PASSES ()
> > >   	  NEXT_PASS (pass_merge_phi);
> > >   	  NEXT_PASS (pass_cd_dce);
> > > diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
> > > index a041858..2a96a39 100644
> > > --- gcc/tree-ssa-loop.c
> > > +++ gcc/tree-ssa-loop.c
> > > @@ -272,6 +272,7 @@ public:
> > > 
> > >     /* opt_pass methods: */
> > >     virtual unsigned int execute (function *);
> > > +  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
> > > 
> > >   }; // class pass_tree_loop_init
> > > 
> > > @@ -566,6 +567,7 @@ public:
> > > 
> > >     /* opt_pass methods: */
> > >     virtual unsigned int execute (function *) { return tree_ssa_loop_done
> > > (); }
> > > +  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
> > > 
> > >   }; // class pass_tree_loop_done
> > > 
> > > 
> > > 
> > > Grüße,
> > >   Thomas
> > > 
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
  2015-06-02 13:58           ` Richard Biener
@ 2015-06-02 15:40             ` Tom de Vries
  2015-06-03 11:26               ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-06-02 15:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On 02-06-15 15:58, Richard Biener wrote:
> Btw, I wonder why you don't organize the oacc-kernel passes in
> a new simple-IPA group after pass_local_optimization_passes.

I've placed the pass group as early as possible (meaning after ealias) and put 
passes in front only when that served a purpose for parallelization (pass_fre). 
The idea there was to minimize the amount of passes that have to be modified to 
deal (conservatively) with a kernels region.

So AFAICT, there's nothing against placing the pass group after 
pass_local_optimization_passes, other that that it's more work in more passes to 
keep the region intact.

What would be the benefit of doing so?

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2015-04-22  7:39       ` Richard Biener
@ 2015-06-03  9:22         ` Tom de Vries
  2015-06-03 11:21           ` Richard Biener
  2015-06-03 10:05         ` Tom de Vries
  1 sibling, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-06-03  9:22 UTC (permalink / raw)
  To: Richard Biener, Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3433 bytes --]

On 22/04/15 09:39, Richard Biener wrote:
>> Committed to gomp-4_0-branch in r222281:
>> >
>> >commit 58c33a7965c379b55b549d50e3b79b2252bcc876
>> >Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
>> >Date:   Tue Apr 21 19:48:16 2015 +0000
>> >
>> >     Add pass_ch_oacc_kernels to pass_oacc_kernels
>> >
>> >     	gcc/
>> >     	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
>> >     	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
>> >     	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
>> >     	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
>> >     	* tree-ssa-loop-ch.c: Include omp-low.h.
>> >     	(pass_ch_execute): Declare.
>> >     	(pass_ch::execute): Factor out ...
>> >     	(pass_ch_execute): ... this new function.  If handling oacc kernels,
>> >     	skip loops that are not in oacc kernels region.
>> >     	(pass_ch_oacc_kernels::execute):
>> >     	(pass_data_ch_oacc_kernels): New pass_data.
>> >     	(class pass_ch_oacc_kernels): New pass.
>> >     	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
>> >     	function.
>> >
>> >     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222281 138bc75d-0d04-0410-961f-82ee72b054a4
>> >---
>> >  gcc/ChangeLog.gomp     |   15 ++++++++
>> >  gcc/omp-low.c          |   91 ++++++++++++++++++++++++++++++++++++++++++++++++
>> >  gcc/omp-low.h          |    2 ++
>> >  gcc/passes.def         |    1 +
>> >  gcc/tree-pass.h        |    1 +
>> >  gcc/tree-ssa-loop-ch.c |   59 +++++++++++++++++++++++++++++--
>> >  6 files changed, 167 insertions(+), 2 deletions(-)
>> >
>> >diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
>> >index 8a53ad8..d00c5e0 100644
>> >--- gcc/ChangeLog.gomp
>> >+++ gcc/ChangeLog.gomp
>> >@@ -1,5 +1,20 @@
>> >  2015-04-21  Tom de Vries<tom@codesourcery.com>
>> >
>> >+	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
>> >+	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
>> >+	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
>> >+	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
>> >+	* tree-ssa-loop-ch.c: Include omp-low.h.
>> >+	(pass_ch_execute): Declare.
>> >+	(pass_ch::execute): Factor out ...
>> >+	(pass_ch_execute): ... this new function.  If handling oacc kernels,
>> >+	skip loops that are not in oacc kernels region.
>> >+	(pass_ch_oacc_kernels::execute):
>> >+	(pass_data_ch_oacc_kernels): New pass_data.
>> >+	(class pass_ch_oacc_kernels): New pass.
>> >+	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
>> >+	function.
>> >+
>> >  	* passes.def: Add pass group pass_oacc_kernels.
>> >  	* tree-pass.h (make_pass_oacc_kernels): Declare.
>> >  	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
>> >diff --git gcc/omp-low.c gcc/omp-low.c
>> >index 16d9a5e..1b03ae6 100644
>> >--- gcc/omp-low.c
>> >+++ gcc/omp-low.c
>> >@@ -13920,4 +13920,95 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
>> >  						   SSA_OP_DEF);
>> >  }
>> >
>> >+/* Return true if LOOP is inside a kernels region.  */
>> >+
>> >+bool
>> >+loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry,
>> >+			       basic_block *region_exit)
> Ehm.  So why not simply add a flag to struct loop instead and set it
> during OMP region parsing/lowering?

Attached patch adds an in_oacc_kernels_region flag to struct loop, and 
uses it. OK for gomp-4_0-branch?

Thanks,
- Tom


[-- Attachment #2: 0001-Add-in_oacc_kernels_region-field-to-struct-loop.patch --]
[-- Type: text/x-patch, Size: 8493 bytes --]

Add in_oacc_kernels_region field to struct loop

2015-06-03  Tom de Vries  <tom@codesourcery.com>

	* cfgloop.h (struct loop): Add in_oacc_kernels_region field.
	* omp-low.c (mark_loops_in_oacc_kernels_region): New function.
	(loop_get_oacc_kernels_region_entry): New function.
	(expand_omp_target): Call mark_loops_in_oacc_kernels_region.
	(loop_in_oacc_kernels_region_p): Remove function.
	* omp-low.h (loop_in_oacc_kernels_region_p): Remove declaration.
	(loop_get_oacc_kernels_region_entry): Declare.
	* tree-parloops.c (parallelize_loops): Use in_oacc_kernels_region field and
	loop_get_oacc_kernels_region_entry.
	* tree-ssa-loop-ch.c (pass_ch_execute): Use in_oacc_kernels_region field.
---
 gcc/cfgloop.h          |   3 +
 gcc/omp-low.c          | 155 ++++++++++++++++++++-----------------------------
 gcc/omp-low.h          |   3 +-
 gcc/tree-parloops.c    |   7 ++-
 gcc/tree-ssa-loop-ch.c |   2 +-
 5 files changed, 73 insertions(+), 97 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 1d84572..a3654d9 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -195,6 +195,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if we should try harder to vectorize this loop.  */
   bool force_vectorize;
 
+  /* True if the loop is part of an oacc kernels region.  */
+  bool in_oacc_kernels_region;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
      by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
      builtins.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 01e5d4b..04c1981 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -9421,6 +9421,68 @@ oacc_alloc_broadcast_storage (omp_context *ctx, tree clauses)
 			   ctx, TYPE_SIZE_UNIT (long_long_unsigned_type_node));
 }
 
+/* Mark the loops inside the kernels region starting at REGION_ENTRY and ending
+   at REGION_EXIT.  */
+
+static void
+mark_loops_in_oacc_kernels_region (basic_block region_entry,
+				   basic_block region_exit)
+{
+  bitmap dominated_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
+  unsigned di;
+  basic_block bb;
+
+  bitmap_clear (dominated_bitmap);
+  bitmap_clear (excludes_bitmap);
+
+  /* Get all the blocks dominated by the region entry.  That will include the
+     entire region.  */
+  vec<basic_block> dominated
+    = get_all_dominated_blocks (CDI_DOMINATORS, region_entry);
+  FOR_EACH_VEC_ELT (dominated, di, bb)
+      bitmap_set_bit (dominated_bitmap, bb->index);
+
+  /* Exclude all the blocks which are not in the region: the blocks dominated by
+     the region exit.  */
+  if (region_exit != NULL)
+    {
+      vec<basic_block> excludes
+	= get_all_dominated_blocks (CDI_DOMINATORS, region_exit);
+      FOR_EACH_VEC_ELT (excludes, di, bb)
+	bitmap_set_bit (excludes_bitmap, bb->index);
+    }
+
+  /* Mark the loops in the region.  */
+  struct loop *loop;
+  FOR_EACH_LOOP (loop, 0)
+    if (bitmap_bit_p (dominated_bitmap, loop->header->index)
+	&& !bitmap_bit_p (excludes_bitmap, loop->header->index))
+      loop->in_oacc_kernels_region = true;
+}
+
+/* Return the entry basic block of the oacc kernels region containing LOOP.  */
+
+basic_block
+loop_get_oacc_kernels_region_entry (struct loop *loop)
+{
+  if (!loop->in_oacc_kernels_region)
+    return NULL;
+
+  basic_block bb = loop->header;
+  while (true)
+    {
+      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+      gcc_assert (bb != NULL);
+
+      gimple last = last_stmt (bb);
+      if (last != NULL
+	  && gimple_code (last) == GIMPLE_OMP_TARGET
+	  && gimple_omp_target_kind (last) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+	return bb;
+    }
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -9491,6 +9553,8 @@ expand_omp_target (struct omp_region *region)
 	     as an optimization barrier.  */
 	  do_splitoff = false;
 	  cfun->curr_properties &= ~PROP_gimple_eomp;
+
+	  mark_loops_in_oacc_kernels_region (region->entry, region->exit);
 	}
       else
 	{
@@ -15164,97 +15228,6 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
 						   SSA_OP_DEF);
 }
 
-/* Return true if LOOP is inside a kernels region.  */
-
-bool
-loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry,
-			       basic_block *region_exit)
-{
-  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
-  bitmap region_bitmap = BITMAP_GGC_ALLOC ();
-  bitmap_clear (region_bitmap);
-
-  if (region_entry != NULL)
-    *region_entry = NULL;
-  if (region_exit != NULL)
-    *region_exit = NULL;
-
-  basic_block bb;
-  gimple last;
-  FOR_EACH_BB_FN (bb, cfun)
-    {
-      if (bitmap_bit_p (region_bitmap, bb->index))
-	continue;
-
-      last = last_stmt (bb);
-      if (!last)
-	continue;
-
-      if (gimple_code (last) != GIMPLE_OMP_TARGET
-	  || (gimple_omp_target_kind (last) != GF_OMP_TARGET_KIND_OACC_KERNELS))
-	continue;
-
-      bitmap_clear (excludes_bitmap);
-      bitmap_set_bit (excludes_bitmap, bb->index);
-
-      vec<basic_block> dominated
-	= get_all_dominated_blocks (CDI_DOMINATORS, bb);
-
-      unsigned di;
-      basic_block dom;
-
-      basic_block end_region = NULL;
-      FOR_EACH_VEC_ELT (dominated, di, dom)
-	{
-	  if (dom == bb)
-	    continue;
-
-	  last = last_stmt (dom);
-	  if (!last)
-	    continue;
-
-	  if (gimple_code (last) != GIMPLE_OMP_RETURN)
-	    continue;
-
-	  if (end_region == NULL
-	      || dominated_by_p (CDI_DOMINATORS, end_region, dom))
-	    end_region = dom;
-	}
-
-      if (end_region == NULL)
-	{
-	  gimple kernels = last_stmt (bb);
-	  fatal_error (gimple_location (kernels),
-		       "End of kernel region unreachable");
-	}
-
-      vec<basic_block> excludes
-	= get_all_dominated_blocks (CDI_DOMINATORS, end_region);
-
-      unsigned di2;
-      basic_block exclude;
-
-      FOR_EACH_VEC_ELT (excludes, di2, exclude)
-	if (exclude != end_region)
-	  bitmap_set_bit (excludes_bitmap, exclude->index);
-
-      FOR_EACH_VEC_ELT (dominated, di, dom)
-	if (!bitmap_bit_p (excludes_bitmap, dom->index))
-	  bitmap_set_bit (region_bitmap, dom->index);
-
-      if (bitmap_bit_p (region_bitmap, loop->header->index))
-	{
-	  if (region_entry != NULL)
-	    *region_entry = bb;
-	  if (region_exit != NULL)
-	    *region_exit = end_region;
-	  return true;
-	}
-    }
-
-  return false;
-}
-
 namespace {
 
 const pass_data pass_data_late_lower_omp =
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index ae63c9f..fbc8416 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -29,8 +29,7 @@ extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
 extern bool gimple_stmt_omp_data_i_init_p (gimple);
-extern bool loop_in_oacc_kernels_region_p (struct loop *, basic_block *,
-					   basic_block *);
+extern basic_block loop_get_oacc_kernels_region_entry (struct loop *);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 72877ee..4f193e6 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2629,7 +2629,7 @@ parallelize_loops (bool oacc_kernels_p)
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
   source_location loop_loc;
-  basic_block region_entry, region_exit;
+  basic_block region_entry;
 
   /* Do not parallelize loops in the functions created by parallelization.  */
   if (parallelized_function_p (cfun->decl))
@@ -2649,8 +2649,7 @@ parallelize_loops (bool oacc_kernels_p)
 
       if (oacc_kernels_p)
 	{
-	  if (!loop_in_oacc_kernels_region_p (loop, &region_entry,
-					      &region_exit))
+	  if (!loop->in_oacc_kernels_region)
 	    continue;
 
 	  /* TODO: Allow nested loops.  */
@@ -2661,6 +2660,8 @@ parallelize_loops (bool oacc_kernels_p)
 	    fprintf (dump_file,
 		     "Trying loop %d with header bb %d in oacc kernels region\n",
 		     loop->num, loop->header->index);
+
+	  region_entry = loop_get_oacc_kernels_region_entry (loop);
 	}
 
       if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 1cd77e6..7527efd 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -225,7 +225,7 @@ pass_ch_execute (function *fun, bool oacc_kernels_p)
 	continue;
 
       if (oacc_kernels_p
-	  && !loop_in_oacc_kernels_region_p (loop, NULL, NULL))
+	  && !loop->in_oacc_kernels_region)
 	continue;
 
       /* Iterate the header copying up to limit; this takes care of the cases
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2015-04-22  7:39       ` Richard Biener
  2015-06-03  9:22         ` Tom de Vries
@ 2015-06-03 10:05         ` Tom de Vries
  2015-06-03 11:22           ` Richard Biener
  1 sibling, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-06-03 10:05 UTC (permalink / raw)
  To: Richard Biener, Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek

On 22/04/15 09:39, Richard Biener wrote:
>> Committed to gomp-4_0-branch in r222281:
>> >
>> >commit 58c33a7965c379b55b549d50e3b79b2252bcc876
>> >Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
>> >Date:   Tue Apr 21 19:48:16 2015 +0000
>> >
>> >     Add pass_ch_oacc_kernels to pass_oacc_kernels
>> >
>> >     	gcc/
>> >     	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
>> >     	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
>> >     	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
>> >     	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
>> >     	* tree-ssa-loop-ch.c: Include omp-low.h.
>> >     	(pass_ch_execute): Declare.
>> >     	(pass_ch::execute): Factor out ...
>> >     	(pass_ch_execute): ... this new function.  If handling oacc kernels,
>> >     	skip loops that are not in oacc kernels region.
>> >     	(pass_ch_oacc_kernels::execute):
>> >     	(pass_data_ch_oacc_kernels): New pass_data.
>> >     	(class pass_ch_oacc_kernels): New pass.
>> >     	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
>> >     	function.
>> >
>> >     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222281 138bc75d-0d04-0410-961f-82ee72b054a4
>> >---
>> >  gcc/ChangeLog.gomp     |   15 ++++++++
>> >  gcc/omp-low.c          |   91 ++++++++++++++++++++++++++++++++++++++++++++++++
>> >  gcc/omp-low.h          |    2 ++
>> >  gcc/passes.def         |    1 +
>> >  gcc/tree-pass.h        |    1 +
>> >  gcc/tree-ssa-loop-ch.c |   59 +++++++++++++++++++++++++++++--
>> >  6 files changed, 167 insertions(+), 2 deletions(-)
>> >
>> >diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
>> >index 8a53ad8..d00c5e0 100644
>> >--- gcc/ChangeLog.gomp
>> >+++ gcc/ChangeLog.gomp
>> >@@ -1,5 +1,20 @@
>> >  2015-04-21  Tom de Vries<tom@codesourcery.com>
>> >
>> >+	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
>> >+	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
>> >+	* passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels.
>> >+	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
>> >+	* tree-ssa-loop-ch.c: Include omp-low.h.
>> >+	(pass_ch_execute): Declare.
>> >+	(pass_ch::execute): Factor out ...
>> >+	(pass_ch_execute): ... this new function.  If handling oacc kernels,
>> >+	skip loops that are not in oacc kernels region.
>> >+	(pass_ch_oacc_kernels::execute):
>> >+	(pass_data_ch_oacc_kernels): New pass_data.
>> >+	(class pass_ch_oacc_kernels): New pass.
>> >+	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
>> >+	function.
>> >+
>> >  	* passes.def: Add pass group pass_oacc_kernels.
>> >  	* tree-pass.h (make_pass_oacc_kernels): Declare.
>> >  	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
>> >diff --git gcc/omp-low.c gcc/omp-low.c
>> >index 16d9a5e..1b03ae6 100644
>> >--- gcc/omp-low.c
>> >+++ gcc/omp-low.c
>> >@@ -13920,4 +13920,95 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
>> >  						   SSA_OP_DEF);
>> >  }
>> >
>> >+/* Return true if LOOP is inside a kernels region.  */
>> >+
>> >+bool
>> >+loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry,
>> >+			       basic_block *region_exit)

<SNIP>

> It's also very odd that you disable transforms on OMP regions but at
> the same time do all the OMP processing_after_  those transforms.
> Something feels backward here.

I'm not sure if I understand your remark in the context of this patch. 
All we do here, is to disable transforming loops in pass_ch_oacc_kernels 
that are not part of a kernels region.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2015-06-03  9:22         ` Tom de Vries
@ 2015-06-03 11:21           ` Richard Biener
  2015-06-04 15:59             ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2015-06-03 11:21 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Wed, 3 Jun 2015, Tom de Vries wrote:

> On 22/04/15 09:39, Richard Biener wrote:
> > > Committed to gomp-4_0-branch in r222281:
> > > >
> > > >commit 58c33a7965c379b55b549d50e3b79b2252bcc876
> > > >Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> > > >Date:   Tue Apr 21 19:48:16 2015 +0000
> > > >
> > > >     Add pass_ch_oacc_kernels to pass_oacc_kernels
> > > >
> > > >     	gcc/
> > > >     	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
> > > >     	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
> > > >     	* passes.def: Add pass_ch_oacc_kernels to pass group
> > > pass_oacc_kernels.
> > > >     	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
> > > >     	* tree-ssa-loop-ch.c: Include omp-low.h.
> > > >     	(pass_ch_execute): Declare.
> > > >     	(pass_ch::execute): Factor out ...
> > > >     	(pass_ch_execute): ... this new function.  If handling oacc
> > > kernels,
> > > >     	skip loops that are not in oacc kernels region.
> > > >     	(pass_ch_oacc_kernels::execute):
> > > >     	(pass_data_ch_oacc_kernels): New pass_data.
> > > >     	(class pass_ch_oacc_kernels): New pass.
> > > >     	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels):
> > > New
> > > >     	function.
> > > >
> > > >     git-svn-id:
> > > svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222281
> > > 138bc75d-0d04-0410-961f-82ee72b054a4
> > > >---
> > > >  gcc/ChangeLog.gomp     |   15 ++++++++
> > > >  gcc/omp-low.c          |   91
> > > ++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  gcc/omp-low.h          |    2 ++
> > > >  gcc/passes.def         |    1 +
> > > >  gcc/tree-pass.h        |    1 +
> > > >  gcc/tree-ssa-loop-ch.c |   59 +++++++++++++++++++++++++++++--
> > > >  6 files changed, 167 insertions(+), 2 deletions(-)
> > > >
> > > >diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> > > >index 8a53ad8..d00c5e0 100644
> > > >--- gcc/ChangeLog.gomp
> > > >+++ gcc/ChangeLog.gomp
> > > >@@ -1,5 +1,20 @@
> > > >  2015-04-21  Tom de Vries<tom@codesourcery.com>
> > > >
> > > >+	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
> > > >+	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
> > > >+	* passes.def: Add pass_ch_oacc_kernels to pass group
> > > pass_oacc_kernels.
> > > >+	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
> > > >+	* tree-ssa-loop-ch.c: Include omp-low.h.
> > > >+	(pass_ch_execute): Declare.
> > > >+	(pass_ch::execute): Factor out ...
> > > >+	(pass_ch_execute): ... this new function.  If handling oacc kernels,
> > > >+	skip loops that are not in oacc kernels region.
> > > >+	(pass_ch_oacc_kernels::execute):
> > > >+	(pass_data_ch_oacc_kernels): New pass_data.
> > > >+	(class pass_ch_oacc_kernels): New pass.
> > > >+	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
> > > >+	function.
> > > >+
> > > >  	* passes.def: Add pass group pass_oacc_kernels.
> > > >  	* tree-pass.h (make_pass_oacc_kernels): Declare.
> > > >  	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
> > > >diff --git gcc/omp-low.c gcc/omp-low.c
> > > >index 16d9a5e..1b03ae6 100644
> > > >--- gcc/omp-low.c
> > > >+++ gcc/omp-low.c
> > > >@@ -13920,4 +13920,95 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
> > > >  						   SSA_OP_DEF);
> > > >  }
> > > >
> > > >+/* Return true if LOOP is inside a kernels region.  */
> > > >+
> > > >+bool
> > > >+loop_in_oacc_kernels_region_p (struct loop *loop, basic_block
> > > *region_entry,
> > > >+			       basic_block *region_exit)
> > Ehm.  So why not simply add a flag to struct loop instead and set it
> > during OMP region parsing/lowering?
> 
> Attached patch adds an in_oacc_kernels_region flag to struct loop, and uses
> it. OK for gomp-4_0-branch?

Works for me.

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2015-06-03 10:05         ` Tom de Vries
@ 2015-06-03 11:22           ` Richard Biener
  0 siblings, 0 replies; 71+ messages in thread
From: Richard Biener @ 2015-06-03 11:22 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Wed, 3 Jun 2015, Tom de Vries wrote:

> On 22/04/15 09:39, Richard Biener wrote:
> > > Committed to gomp-4_0-branch in r222281:
> > > >
> > > >commit 58c33a7965c379b55b549d50e3b79b2252bcc876
> > > >Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> > > >Date:   Tue Apr 21 19:48:16 2015 +0000
> > > >
> > > >     Add pass_ch_oacc_kernels to pass_oacc_kernels
> > > >
> > > >     	gcc/
> > > >     	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
> > > >     	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
> > > >     	* passes.def: Add pass_ch_oacc_kernels to pass group
> > > pass_oacc_kernels.
> > > >     	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
> > > >     	* tree-ssa-loop-ch.c: Include omp-low.h.
> > > >     	(pass_ch_execute): Declare.
> > > >     	(pass_ch::execute): Factor out ...
> > > >     	(pass_ch_execute): ... this new function.  If handling oacc
> > > kernels,
> > > >     	skip loops that are not in oacc kernels region.
> > > >     	(pass_ch_oacc_kernels::execute):
> > > >     	(pass_data_ch_oacc_kernels): New pass_data.
> > > >     	(class pass_ch_oacc_kernels): New pass.
> > > >     	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels):
> > > New
> > > >     	function.
> > > >
> > > >     git-svn-id:
> > > svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222281
> > > 138bc75d-0d04-0410-961f-82ee72b054a4
> > > >---
> > > >  gcc/ChangeLog.gomp     |   15 ++++++++
> > > >  gcc/omp-low.c          |   91
> > > ++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  gcc/omp-low.h          |    2 ++
> > > >  gcc/passes.def         |    1 +
> > > >  gcc/tree-pass.h        |    1 +
> > > >  gcc/tree-ssa-loop-ch.c |   59 +++++++++++++++++++++++++++++--
> > > >  6 files changed, 167 insertions(+), 2 deletions(-)
> > > >
> > > >diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> > > >index 8a53ad8..d00c5e0 100644
> > > >--- gcc/ChangeLog.gomp
> > > >+++ gcc/ChangeLog.gomp
> > > >@@ -1,5 +1,20 @@
> > > >  2015-04-21  Tom de Vries<tom@codesourcery.com>
> > > >
> > > >+	* omp-low.c (loop_in_oacc_kernels_region_p): New function.
> > > >+	* omp-low.h (loop_in_oacc_kernels_region_p): Declare.
> > > >+	* passes.def: Add pass_ch_oacc_kernels to pass group
> > > pass_oacc_kernels.
> > > >+	* tree-pass.h (make_pass_ch_oacc_kernels): Declare
> > > >+	* tree-ssa-loop-ch.c: Include omp-low.h.
> > > >+	(pass_ch_execute): Declare.
> > > >+	(pass_ch::execute): Factor out ...
> > > >+	(pass_ch_execute): ... this new function.  If handling oacc kernels,
> > > >+	skip loops that are not in oacc kernels region.
> > > >+	(pass_ch_oacc_kernels::execute):
> > > >+	(pass_data_ch_oacc_kernels): New pass_data.
> > > >+	(class pass_ch_oacc_kernels): New pass.
> > > >+	(pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New
> > > >+	function.
> > > >+
> > > >  	* passes.def: Add pass group pass_oacc_kernels.
> > > >  	* tree-pass.h (make_pass_oacc_kernels): Declare.
> > > >  	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
> > > >diff --git gcc/omp-low.c gcc/omp-low.c
> > > >index 16d9a5e..1b03ae6 100644
> > > >--- gcc/omp-low.c
> > > >+++ gcc/omp-low.c
> > > >@@ -13920,4 +13920,95 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
> > > >  						   SSA_OP_DEF);
> > > >  }
> > > >
> > > >+/* Return true if LOOP is inside a kernels region.  */
> > > >+
> > > >+bool
> > > >+loop_in_oacc_kernels_region_p (struct loop *loop, basic_block
> > > *region_entry,
> > > >+			       basic_block *region_exit)
> 
> <SNIP>
> 
> > It's also very odd that you disable transforms on OMP regions but at
> > the same time do all the OMP processing_after_  those transforms.
> > Something feels backward here.
> 
> I'm not sure if I understand your remark in the context of this patch. All we
> do here, is to disable transforming loops in pass_ch_oacc_kernels that are not
> part of a kernels region.

Ah, I probably failed to realize this.

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
  2015-06-02 15:40             ` Tom de Vries
@ 2015-06-03 11:26               ` Richard Biener
  0 siblings, 0 replies; 71+ messages in thread
From: Richard Biener @ 2015-06-03 11:26 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Tue, 2 Jun 2015, Tom de Vries wrote:

> On 02-06-15 15:58, Richard Biener wrote:
> > Btw, I wonder why you don't organize the oacc-kernel passes in
> > a new simple-IPA group after pass_local_optimization_passes.
> 
> I've placed the pass group as early as possible (meaning after ealias) and put
> passes in front only when that served a purpose for parallelization
> (pass_fre). The idea there was to minimize the amount of passes that have to
> be modified to deal (conservatively) with a kernels region.

I see.

> So AFAICT, there's nothing against placing the pass group after
> pass_local_optimization_passes, other that that it's more work in more passes
> to keep the region intact.
> 
> What would be the benefit of doing so?

Get all the local optimizations done, including pure-const discovery.

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
  2015-06-03 11:21           ` Richard Biener
@ 2015-06-04 15:59             ` Tom de Vries
  0 siblings, 0 replies; 71+ messages in thread
From: Tom de Vries @ 2015-06-04 15:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 450 bytes --]

On 03/06/15 13:20, Richard Biener wrote:
> On Wed, 3 Jun 2015, Tom de Vries wrote:
>
>> On 22/04/15 09:39, Richard Biener wrote:
>>> Ehm.  So why not simply add a flag to struct loop instead and set it
>>> during OMP region parsing/lowering?
>>
>> Attached patch adds an in_oacc_kernels_region flag to struct loop, and uses
>> it. OK for gomp-4_0-branch?
>
> Works for me.
>

Committed as attached, with minor fix to pass bootstrap.

Thanks,
- Tom



[-- Attachment #2: 0001-Add-in_oacc_kernels_region-field-to-struct-loop.patch --]
[-- Type: text/x-patch, Size: 8467 bytes --]

Add in_oacc_kernels_region field to struct loop

2015-06-03  Tom de Vries  <tom@codesourcery.com>

	* cfgloop.h (struct loop): Add in_oacc_kernels_region field.
	* omp-low.c (mark_loops_in_oacc_kernels_region): New function.
	(loop_get_oacc_kernels_region_entry): New function.
	(expand_omp_target): Call mark_loops_in_oacc_kernels_region.
	(loop_in_oacc_kernels_region_p): Remove function.
	* omp-low.h (loop_in_oacc_kernels_region_p): Remove declaration.
	(loop_get_oacc_kernels_region_entry): Declare.
	* tree-parloops.c (parallelize_loops): Use in_oacc_kernels_region field and
	loop_get_oacc_kernels_region_entry.
	* tree-ssa-loop-ch.c (pass_ch_execute): Use in_oacc_kernels_region field.
---
 gcc/cfgloop.h          |   3 +
 gcc/omp-low.c          | 155 ++++++++++++++++++++-----------------------------
 gcc/omp-low.h          |   3 +-
 gcc/tree-parloops.c    |   7 ++-
 gcc/tree-ssa-loop-ch.c |   2 +-
 5 files changed, 73 insertions(+), 97 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 1d84572..a3654d9 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -195,6 +195,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if we should try harder to vectorize this loop.  */
   bool force_vectorize;
 
+  /* True if the loop is part of an oacc kernels region.  */
+  bool in_oacc_kernels_region;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
      by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
      builtins.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b1aa603..22a57af 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -9425,6 +9425,68 @@ oacc_alloc_broadcast_storage (omp_context *ctx)
 			   TYPE_SIZE_UNIT (vull_type_node));
 }
 
+/* Mark the loops inside the kernels region starting at REGION_ENTRY and ending
+   at REGION_EXIT.  */
+
+static void
+mark_loops_in_oacc_kernels_region (basic_block region_entry,
+				   basic_block region_exit)
+{
+  bitmap dominated_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
+  unsigned di;
+  basic_block bb;
+
+  bitmap_clear (dominated_bitmap);
+  bitmap_clear (excludes_bitmap);
+
+  /* Get all the blocks dominated by the region entry.  That will include the
+     entire region.  */
+  vec<basic_block> dominated
+    = get_all_dominated_blocks (CDI_DOMINATORS, region_entry);
+  FOR_EACH_VEC_ELT (dominated, di, bb)
+      bitmap_set_bit (dominated_bitmap, bb->index);
+
+  /* Exclude all the blocks which are not in the region: the blocks dominated by
+     the region exit.  */
+  if (region_exit != NULL)
+    {
+      vec<basic_block> excludes
+	= get_all_dominated_blocks (CDI_DOMINATORS, region_exit);
+      FOR_EACH_VEC_ELT (excludes, di, bb)
+	bitmap_set_bit (excludes_bitmap, bb->index);
+    }
+
+  /* Mark the loops in the region.  */
+  struct loop *loop;
+  FOR_EACH_LOOP (loop, 0)
+    if (bitmap_bit_p (dominated_bitmap, loop->header->index)
+	&& !bitmap_bit_p (excludes_bitmap, loop->header->index))
+      loop->in_oacc_kernels_region = true;
+}
+
+/* Return the entry basic block of the oacc kernels region containing LOOP.  */
+
+basic_block
+loop_get_oacc_kernels_region_entry (struct loop *loop)
+{
+  if (!loop->in_oacc_kernels_region)
+    return NULL;
+
+  basic_block bb = loop->header;
+  while (true)
+    {
+      bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+      gcc_assert (bb != NULL);
+
+      gimple last = last_stmt (bb);
+      if (last != NULL
+	  && gimple_code (last) == GIMPLE_OMP_TARGET
+	  && gimple_omp_target_kind (last) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+	return bb;
+    }
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -9495,6 +9557,8 @@ expand_omp_target (struct omp_region *region)
 	     as an optimization barrier.  */
 	  do_splitoff = false;
 	  cfun->curr_properties &= ~PROP_gimple_eomp;
+
+	  mark_loops_in_oacc_kernels_region (region->entry, region->exit);
 	}
       else
 	{
@@ -15331,97 +15395,6 @@ gimple_stmt_omp_data_i_init_p (gimple stmt)
 						   SSA_OP_DEF);
 }
 
-/* Return true if LOOP is inside a kernels region.  */
-
-bool
-loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry,
-			       basic_block *region_exit)
-{
-  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
-  bitmap region_bitmap = BITMAP_GGC_ALLOC ();
-  bitmap_clear (region_bitmap);
-
-  if (region_entry != NULL)
-    *region_entry = NULL;
-  if (region_exit != NULL)
-    *region_exit = NULL;
-
-  basic_block bb;
-  gimple last;
-  FOR_EACH_BB_FN (bb, cfun)
-    {
-      if (bitmap_bit_p (region_bitmap, bb->index))
-	continue;
-
-      last = last_stmt (bb);
-      if (!last)
-	continue;
-
-      if (gimple_code (last) != GIMPLE_OMP_TARGET
-	  || (gimple_omp_target_kind (last) != GF_OMP_TARGET_KIND_OACC_KERNELS))
-	continue;
-
-      bitmap_clear (excludes_bitmap);
-      bitmap_set_bit (excludes_bitmap, bb->index);
-
-      vec<basic_block> dominated
-	= get_all_dominated_blocks (CDI_DOMINATORS, bb);
-
-      unsigned di;
-      basic_block dom;
-
-      basic_block end_region = NULL;
-      FOR_EACH_VEC_ELT (dominated, di, dom)
-	{
-	  if (dom == bb)
-	    continue;
-
-	  last = last_stmt (dom);
-	  if (!last)
-	    continue;
-
-	  if (gimple_code (last) != GIMPLE_OMP_RETURN)
-	    continue;
-
-	  if (end_region == NULL
-	      || dominated_by_p (CDI_DOMINATORS, end_region, dom))
-	    end_region = dom;
-	}
-
-      if (end_region == NULL)
-	{
-	  gimple kernels = last_stmt (bb);
-	  fatal_error (gimple_location (kernels),
-		       "End of kernel region unreachable");
-	}
-
-      vec<basic_block> excludes
-	= get_all_dominated_blocks (CDI_DOMINATORS, end_region);
-
-      unsigned di2;
-      basic_block exclude;
-
-      FOR_EACH_VEC_ELT (excludes, di2, exclude)
-	if (exclude != end_region)
-	  bitmap_set_bit (excludes_bitmap, exclude->index);
-
-      FOR_EACH_VEC_ELT (dominated, di, dom)
-	if (!bitmap_bit_p (excludes_bitmap, dom->index))
-	  bitmap_set_bit (region_bitmap, dom->index);
-
-      if (bitmap_bit_p (region_bitmap, loop->header->index))
-	{
-	  if (region_entry != NULL)
-	    *region_entry = bb;
-	  if (region_exit != NULL)
-	    *region_exit = end_region;
-	  return true;
-	}
-    }
-
-  return false;
-}
-
 namespace {
 
 const pass_data pass_data_late_lower_omp =
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index ae63c9f..fbc8416 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -29,8 +29,7 @@ extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
 extern bool gimple_stmt_omp_data_i_init_p (gimple);
-extern bool loop_in_oacc_kernels_region_p (struct loop *, basic_block *,
-					   basic_block *);
+extern basic_block loop_get_oacc_kernels_region_entry (struct loop *);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 72877ee..e451704 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2629,7 +2629,7 @@ parallelize_loops (bool oacc_kernels_p)
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
   source_location loop_loc;
-  basic_block region_entry, region_exit;
+  basic_block region_entry = NULL;
 
   /* Do not parallelize loops in the functions created by parallelization.  */
   if (parallelized_function_p (cfun->decl))
@@ -2649,8 +2649,7 @@ parallelize_loops (bool oacc_kernels_p)
 
       if (oacc_kernels_p)
 	{
-	  if (!loop_in_oacc_kernels_region_p (loop, &region_entry,
-					      &region_exit))
+	  if (!loop->in_oacc_kernels_region)
 	    continue;
 
 	  /* TODO: Allow nested loops.  */
@@ -2661,6 +2660,8 @@ parallelize_loops (bool oacc_kernels_p)
 	    fprintf (dump_file,
 		     "Trying loop %d with header bb %d in oacc kernels region\n",
 		     loop->num, loop->header->index);
+
+	  region_entry = loop_get_oacc_kernels_region_entry (loop);
 	}
 
       if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 1cd77e6..7527efd 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -225,7 +225,7 @@ pass_ch_execute (function *fun, bool oacc_kernels_p)
 	continue;
 
       if (oacc_kernels_p
-	  && !loop_in_oacc_kernels_region_p (loop, NULL, NULL))
+	  && !loop->in_oacc_kernels_region)
 	continue;
 
       /* Iterate the header copying up to limit; this takes care of the cases
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Expand oacc kernels after pass_fre
  2015-04-22  7:36         ` Richard Biener
@ 2015-06-04 16:50           ` Tom de Vries
  2015-06-08  7:29             ` Richard Biener
  2015-08-05  7:24             ` [committed, gomp4] Fix release_dangling_ssa_names Tom de Vries
  0 siblings, 2 replies; 71+ messages in thread
From: Tom de Vries @ 2015-06-04 16:50 UTC (permalink / raw)
  To: Richard Biener, Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 26469 bytes --]

On 22/04/15 09:36, Richard Biener wrote:
> On Tue, 21 Apr 2015, Thomas Schwinge wrote:
>
>> Hi!
>>
>> On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 24-11-14 11:56, Tom de Vries wrote:
>>>> On 15-11-14 18:19, Tom de Vries wrote:
>>>>> On 15-11-14 13:14, Tom de Vries wrote:
>>>>>> I'm submitting a patch series with initial support for the oacc kernels
>>>>>> directive.
>>>>>>
>>>>>> The patch series uses pass_parallelize_loops to implement parallelization of
>>>>>> loops in the oacc kernels region.
>>>>>>
>>>>>> The patch series consists of these 8 patches:
>>>>>> ...
>>>>>>       1  Expand oacc kernels after pass_build_ealias
>>>>>>       2  Add pass_oacc_kernels
>>>>>>       3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>>>>>       4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>>>>>       5  Add pass_loop_im to pass_oacc_kernels
>>>>>>       6  Add pass_ccp to pass_oacc_kernels
>>>>>>       7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>>>>>       8  Do simple omp lowering for no address taken var
>>>>>> ...
>>>>>
>>>>> This patch moves omp expansion of the oacc kernels directive to after
>>>>> pass_build_ealias.
>>>>>
>>>>> The rationale is that in order to use pass_parallelize_loops for analysis and
>>>>> transformation of an oacc kernels region, we postpone omp expansion of that
>>>>> region until the earliest point in the pass list where enough information is
>>>>> availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.
>>>>>
>>>>> The patch postpones expansion in expand_omp, and ensures expansion by adding
>>>>> pass_expand_omp_ssa:
>>>>> - after pass_build_ealias, and
>>>>> - after pass_all_early_optimizations for the case we're not optimizing.
>>>>>
>>>>> In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
>>>>> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
>>>>> lowered omp code, to handle it conservatively.
>>>>>
>>>>> The patch contains changes in expand_omp_target to deal with ssa-code, similar
>>>>> to what is already present in expand_omp_taskreg.
>>>>>
>>>>> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
>>>>> static for oacc kernels. It does this to get some references to .omp_data_sizes
>>>>> and .omp_data_kinds in the ssa code.  Without these references, the definitions
>>>>> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
>>>>> enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
>>>>> kludge for this purpose ].
>>>>>
>>>>> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
>>>>> original function of which the definition has been removed (as in moved to the
>>>>> split off function). TODO_remove_unused_locals takes care of some of them, but
>>>>> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
>>>>> dangling SSA_NAMEs and releases them.
>>>>>
>>>>
>>>> Reposting with small update: I've replaced the use of the rather generic
>>>> gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.
>>>>
>>>> Bootstrapped and reg-tested in the same way as before.
>>>>
>>>
>>> I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
>>>
>>> This allows fre to unify references to the same omp variable before entering
>>> pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
>>>
>>> F.i. this reduction fragment:
>>> ...
>>>     # VUSE <.MEM_8>
>>>     # PT = { D.2282 }
>>>     _67 = .omp_data_i_59->sumD.2270;
>>>     # VUSE <.MEM_8>
>>>     _68 = *_67;
>>>
>>>     _70 = _66 + _68;
>>>
>>>     # VUSE <.MEM_8>
>>>     # PT = { D.2282 }
>>>     _69 = .omp_data_i_59->sumD.2270;
>>>     # .MEM_71 = VDEF <.MEM_8>
>>>     *_69 = _70;
>>> ...
>>>
>>> is transformed by fre into:
>>> ...
>>>     # VUSE <.MEM_8>
>>>     # PT = { D.2282 }
>>>     _67 = .omp_data_i_59->sumD.2270;
>>>     # VUSE <.MEM_8>
>>>     _68 = *_67;
>>>
>>>     _70 = _66 + _68;
>>>
>>>     # .MEM_71 = VDEF <.MEM_8>
>>>     *_67 = _70;
>>> ...
>>>
>>> In order for pass_fre to respect the kernels region boundaries, I've added a
>>> change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively.
>>>
>>> Bootstrapped and reg-tested as before.
>>>
>>> OK for trunk?
>>
>> Committed to gomp-4_0-branch in r222279:
>>
>> commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
>> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Tue Apr 21 19:37:19 2015 +0000
>>
>>      Expand oacc kernels after pass_fre
>>
>>      	gcc/
>>      	* omp-low.c: Include gimple-pretty-print.h.
>>      	(release_first_vuse_in_edge_dest): New function.
>>      	(expand_omp_target): When not in ssa, don't split off oacc kernels
>>      	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
>>      	expanssion, and add GOACC_kernels_internal call.
>>      	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
>>      	into GOACC_kernels call.  Handle ssa-code.
>>      	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
>>      	properties_provided field.
>>      	(pass_expand_omp::execute): Set PROP_gimple_eomp in
>>      	cfun->curr_properties tentatively.
>>      	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
>>      	todo_flags_finish field.
>>      	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
>>      	execute_expand_omp.
>>      	(gimple_stmt_ssa_operand_references_var_p)
>>      	(gimple_stmt_omp_data_i_init_p): New function.
>>      	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
>>      	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
>>      	pass_expand_omp_ssa after pass_all_early_optimizations.
>>      	* tree-ssa-ccp.c: Include omp-low.h.
>>      	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
>>      	conservatively.
>>      	* tree-ssa-forwprop.c: Include omp-low.h.
>>      	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
>>      	* tree-ssa-sccvn.c: Include omp-low.h.
>>      	(visit_use): Handle .omp_data_i init conservatively.
>>      	* cgraph.c (cgraph_node::release_body): Don't release offloadable
>>      	functions.
>>
>>      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4
>> ---
>>   gcc/ChangeLog.gomp      |   30 +++++++
>>   gcc/cgraph.c            |    9 ++
>>   gcc/omp-low.c           |  214 ++++++++++++++++++++++++++++++++++++++++++++---
>>   gcc/omp-low.h           |    1 +
>>   gcc/passes.def          |    2 +
>>   gcc/tree-ssa-ccp.c      |    6 ++
>>   gcc/tree-ssa-forwprop.c |    4 +-
>>   gcc/tree-ssa-sccvn.c    |    4 +-
>>   8 files changed, 257 insertions(+), 13 deletions(-)
>>
>> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
>> index 7885189..1f86160 100644
>> --- gcc/ChangeLog.gomp
>> +++ gcc/ChangeLog.gomp
>> @@ -1,5 +1,35 @@
>>   2015-04-21  Tom de Vries  <tom@codesourcery.com>
>>
>> +	* omp-low.c: Include gimple-pretty-print.h.
>> +	(release_first_vuse_in_edge_dest): New function.
>> +	(expand_omp_target): When not in ssa, don't split off oacc kernels
>> +	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
>> +	expanssion, and add GOACC_kernels_internal call.
>> +	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
>> +	into GOACC_kernels call.  Handle ssa-code.
>> +	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
>> +	properties_provided field.
>> +	(pass_expand_omp::execute): Set PROP_gimple_eomp in
>> +	cfun->curr_properties tentatively.
>> +	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
>> +	todo_flags_finish field.
>> +	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
>> +	execute_expand_omp.
>> +	(gimple_stmt_ssa_operand_references_var_p)
>> +	(gimple_stmt_omp_data_i_init_p): New function.
>> +	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
>> +	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
>> +	pass_expand_omp_ssa after pass_all_early_optimizations.
>> +	* tree-ssa-ccp.c: Include omp-low.h.
>> +	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
>> +	conservatively.
>> +	* tree-ssa-forwprop.c: Include omp-low.h.
>> +	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
>> +	* tree-ssa-sccvn.c: Include omp-low.h.
>> +	(visit_use): Handle .omp_data_i init conservatively.
>> +	* cgraph.c (cgraph_node::release_body): Don't release offloadable
>> +	functions.
>> +
>>   	* builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
>>   	(ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add
>>   	DEF_ATTR_TREE_LIST.
>> diff --git gcc/cgraph.c gcc/cgraph.c
>> index e099856..c608d7e 100644
>> --- gcc/cgraph.c
>> +++ gcc/cgraph.c
>> @@ -1706,6 +1706,15 @@ release_function_body (tree decl)
>>   void
>>   cgraph_node::release_body (bool keep_arguments)
>>   {
>> +  /* The omp-expansion of the oacc kernels directive is post-poned till after
>> +     all_small_ipa_passes.  That means pass_ipa_free_lang_data, which tries to
>> +     release the body of the offload function, is run before omp_expand_target
>> +     can process the oacc kernels directive,  and omp_expand_target would crash
>> +     trying to access the body.  This snippet works around this problem.
>> +     FIXME: This should probably be fixed in a different way.  */
>> +  if (offloadable)
>> +    return;
>> +
>>     ipa_transforms_to_apply.release ();
>>     if (!used_as_abstract_origin && symtab->state != PARSING)
>>       {
>> diff --git gcc/omp-low.c gcc/omp-low.c
>> index 4134f3d..16d9a5e 100644
>> --- gcc/omp-low.c
>> +++ gcc/omp-low.c
>> @@ -108,6 +108,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "context.h"
>>   #include "lto-section-names.h"
>>   #include "gomp-constants.h"
>> +#include "gimple-pretty-print.h"
>>
>>
>>   /* Lowering of OMP parallel and workshare constructs proceeds in two
>> @@ -5353,6 +5354,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
>>       }
>>   }
>>
>> +static void
>> +release_first_vuse_in_edge_dest (edge e)
>
> All functions need a comment with documentation.
>

Fixed and committed (ommitting patch as trivial).

>> +{
>> +  gimple_stmt_iterator i;
>> +  basic_block bb = e->dest;
>> +
>> +  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
>> +    {
>> +      gimple phi = gsi_stmt (i);
>> +      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
>> +
>> +      if (!virtual_operand_p (arg))
>> +	continue;
>> +
>> +      mark_virtual_operand_for_renaming (arg);
>> +      return;
>> +    }
>> +
>> +  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
>> +    {
>> +      gimple stmt = gsi_stmt (i);
>> +      if (gimple_vuse (stmt) == NULL_TREE)
>> +	continue;
>> +
>> +      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
>> +      return;
>> +    }
>> +}
>> +
>>   /* Expand the OpenMP parallel or task directive starting at REGION.  */
>>
>>   static void
>> @@ -8770,8 +8800,11 @@ expand_omp_target (struct omp_region *region)
>>     gimple stmt;
>>     edge e;
>>     bool offloaded, data_region;
>> +  bool do_emit_library_call = true;
>> +  bool do_splitoff = true;
>>
>>     entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
>> +
>>     new_bb = region->entry;
>>
>>     offloaded = is_gimple_omp_offloaded (entry_stmt);
>> @@ -8804,12 +8837,48 @@ expand_omp_target (struct omp_region *region)
>>     /* Supported by expand_omp_taskreg, but not here.  */
>>     if (child_cfun != NULL)
>>       gcc_checking_assert (!child_cfun->cfg);
>> -  gcc_checking_assert (!gimple_in_ssa_p (cfun));
>>
>>     entry_bb = region->entry;
>>     exit_bb = region->exit;
>>
>> -  if (offloaded)
>> +  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
>> +    {
>> +      if (!gimple_in_ssa_p (cfun))
>> +	{
>> +	  /* We need to do analysis and optimizations on the kernels region
>> +	     before splitoff.  Since that's hard to do on low gimple, we
>> +	     postpone the splitoff until we're in SSA.
>> +	     However, we do the emit of the corresponding function call already,
>> +	     in order to keep the arguments of the call alive until the
>> +	     splitoff.
>> +	     Since at this point the function that is called is empty, we can
>> +	     model the function as BUILT_IN_GOACC_KERNELS_INTERNAL, which marks
>> +	     some of it's function arguments as non-escaping, so it acts less
>> +	     as an optimization barrier.  */
>> +	  do_splitoff = false;
>> +	  cfun->curr_properties &= ~PROP_gimple_eomp;
>> +	}
>> +      else
>> +	{
>> +	  /* Don't emit the library call.  We've already done that.  */
>> +	  do_emit_library_call = false;
>> +	  /* Transform BUILT_IN_GOACC_KERNELS_INTERNAL into
>> +	     BUILT_IN_GOACC_KERNELS_INTERNAL.  Now that the function body will be
>> +	     split off, we can no longer regard the omp_data_array reference as
>> +	     non-escaping.  */
>> +	  gsi = gsi_last_bb (entry_bb);
>> +	  gsi_prev (&gsi);
>> +	  gcall *call = as_a <gcall *> (gsi_stmt (gsi));
>> +	  gcc_assert (gimple_call_builtin_p (call, BUILT_IN_GOACC_KERNELS_INTERNAL));
>> +	  tree fndecl = builtin_decl_explicit (BUILT_IN_GOACC_KERNELS);
>> +	  gimple_call_set_fndecl (call, fndecl);
>> +	  gimple_call_set_fntype (call, TREE_TYPE (fndecl));
>> +	  gimple_call_reset_alias_info (call);
>> +	}
>> +    }
>> +
>> +  if (offloaded
>> +      && do_splitoff)
>>       {
>>         unsigned srcidx, dstidx, num;
>>
>> @@ -8831,7 +8900,7 @@ expand_omp_target (struct omp_region *region)
>>   	{
>>   	  basic_block entry_succ_bb = single_succ (entry_bb);
>>   	  gimple_stmt_iterator gsi;
>> -	  tree arg;
>> +	  tree arg, narg;
>>   	  gimple tgtcopy_stmt = NULL;
>>   	  tree sender = TREE_VEC_ELT (data_arg, 0);
>>
>> @@ -8861,8 +8930,27 @@ expand_omp_target (struct omp_region *region)
>>   	  gcc_assert (tgtcopy_stmt != NULL);
>>   	  arg = DECL_ARGUMENTS (child_fn);
>>
>> -	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
>> -	  gsi_remove (&gsi, true);
>> +	  if (!gimple_in_ssa_p (cfun))
>> +	    {
>> +	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
>> +	      gsi_remove (&gsi, true);
>> +	    }
>> +	  else
>> +	    {
>> +	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
>> +			  == arg);
>> +
>> +	      /* If we are in ssa form, we must load the value from the default
>> +		 definition of the argument.  That should not be defined now,
>> +		 since the argument is not used uninitialized.  */
>> +	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
>> +	      narg = make_ssa_name (arg, gimple_build_nop ());
>> +	      set_ssa_default_def (cfun, arg, narg);
>> +	      /* ?? Is setting the subcode really necessary ??  */
>> +	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
>> +	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
>> +	      update_stmt (tgtcopy_stmt);
>> +	    }
>>   	}
>>
>>         /* Declare local variables needed in CHILD_CFUN.  */
>> @@ -8905,11 +8993,23 @@ expand_omp_target (struct omp_region *region)
>>   	  stmt = gimple_build_return (NULL);
>>   	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
>>   	  gsi_remove (&gsi, true);
>> +
>> +	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
>> +	     which is about to be split off.  Mark the vdef for renaming.  */
>> +	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
>>   	}
>>
>>         /* Move the offloading region into CHILD_CFUN.  */
>>
>> -      block = gimple_block (entry_stmt);
>> +      if (gimple_in_ssa_p (cfun))
>> +	{
>> +	  init_tree_ssa (child_cfun);
>> +	  init_ssa_operands (child_cfun);
>> +	  child_cfun->gimple_df->in_ssa_p = true;
>> +	  block = NULL_TREE;
>> +	}
>> +      else
>> +	block = gimple_block (entry_stmt);
>>
>>         new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
>>         if (exit_bb)
>> @@ -8969,9 +9069,18 @@ expand_omp_target (struct omp_region *region)
>>   	  if (changed)
>>   	    cleanup_tree_cfg ();
>>   	}
>> +      if (gimple_in_ssa_p (cfun))
>> +	update_ssa (TODO_update_ssa);
>>         pop_cfun ();
>>       }
>>
>> +  if (!do_emit_library_call)
>> +    {
>> +      if (gimple_in_ssa_p (cfun))
>> +	update_ssa (TODO_update_ssa_only_virtuals);
>> +      return;
>> +    }
>> +
>>     /* Emit a library call to launch the offloading region, or do data
>>        transfers.  */
>>     tree t1, t2, t3, t4, device, cond, c, clauses;
>> @@ -8993,7 +9102,7 @@ expand_omp_target (struct omp_region *region)
>>         start_ix = BUILT_IN_GOACC_PARALLEL;
>>         break;
>>       case GF_OMP_TARGET_KIND_OACC_KERNELS:
>> -      start_ix = BUILT_IN_GOACC_KERNELS;
>> +      start_ix = BUILT_IN_GOACC_KERNELS_INTERNAL;
>>         break;
>>       case GF_OMP_TARGET_KIND_OACC_DATA:
>>         start_ix = BUILT_IN_GOACC_DATA_START;
>> @@ -9128,6 +9237,7 @@ expand_omp_target (struct omp_region *region)
>>       case BUILT_IN_GOACC_DATA_START:
>>       case BUILT_IN_GOACC_ENTER_EXIT_DATA:
>>       case BUILT_IN_GOACC_KERNELS:
>> +    case BUILT_IN_GOACC_KERNELS_INTERNAL:
>>       case BUILT_IN_GOACC_PARALLEL:
>>       case BUILT_IN_GOACC_UPDATE:
>>         break;
>> @@ -9146,6 +9256,7 @@ expand_omp_target (struct omp_region *region)
>>       case BUILT_IN_GOMP_TARGET_UPDATE:
>>         break;
>>       case BUILT_IN_GOACC_KERNELS:
>> +    case BUILT_IN_GOACC_KERNELS_INTERNAL:
>>       case BUILT_IN_GOACC_PARALLEL:
>>         {
>>   	tree t_num_gangs, t_num_workers, t_vector_length;
>> @@ -9249,6 +9360,8 @@ expand_omp_target (struct omp_region *region)
>>         gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
>>         gsi_remove (&gsi, true);
>>       }
>> +  if (gimple_in_ssa_p (cfun))
>> +    update_ssa (TODO_update_ssa_only_virtuals);
>>   }
>>
>>
>> @@ -9503,7 +9616,7 @@ const pass_data pass_data_expand_omp =
>>     OPTGROUP_NONE, /* optinfo_flags */
>>     TV_NONE, /* tv_id */
>>     PROP_gimple_any, /* properties_required */
>> -  PROP_gimple_eomp, /* properties_provided */
>> +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
>>     0, /* properties_destroyed */
>>     0, /* todo_flags_start */
>>     0, /* todo_flags_finish */
>> @@ -9517,12 +9630,14 @@ public:
>>     {}
>>
>>     /* opt_pass methods: */
>> -  virtual unsigned int execute (function *)
>> +  virtual unsigned int execute (function *fun)
>>       {
>>         bool gate = ((flag_cilkplus != 0 || flag_openacc != 0 || flag_openmp != 0
>>   		    || flag_openmp_simd != 0)
>>   		   && !seen_error ());
>>
>> +      fun->curr_properties |= PROP_gimple_eomp;
>> +
>>         /* This pass always runs, to provide PROP_gimple_eomp.
>>   	 But often, there is nothing to do.  */
>>         if (!gate)
>> @@ -9553,7 +9668,8 @@ const pass_data pass_data_expand_omp_ssa =
>>     PROP_gimple_eomp, /* properties_provided */
>>     0, /* properties_destroyed */
>>     0, /* todo_flags_start */
>> -  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
>> +  TODO_cleanup_cfg | TODO_rebuild_alias
>> +  | TODO_remove_unused_locals, /* todo_flags_finish */
>>   };
>>
>>   class pass_expand_omp_ssa : public gimple_opt_pass
>> @@ -9568,7 +9684,48 @@ public:
>>       {
>>         return !(fun->curr_properties & PROP_gimple_eomp);
>>       }
>> -  virtual unsigned int execute (function *) { return execute_expand_omp (); }
>> +  virtual unsigned int execute (function *)
>
> Please move this out of the class body.
>

Fixed and committed (ommitting patch as trivial).

>> +    {
>> +      unsigned res = execute_expand_omp ();
>> +
>> +      /* After running pass_expand_omp_ssa to expand the oacc kernels
>> +	 directive, we are left in the original function with anonymous
>> +	 SSA_NAMEs, with a defining statement that has been deleted.  This
>> +	 pass finds those SSA_NAMEs and releases them.
>> +	 TODO: Either fix this elsewhere, or make the fix unnecessary.  */
>> +      unsigned int i;
>> +      for (i = 1; i < num_ssa_names; ++i)
>> +	{
>> +	  tree name = ssa_name (i);
>> +	  if (name == NULL_TREE)
>> +	    continue;
>> +
>> +	  gimple stmt = SSA_NAME_DEF_STMT (name);
>> +	  bool found = false;
>> +
>> +	  ssa_op_iter op_iter;
>> +	  def_operand_p def_p;
>> +	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
>> +	    {
>> +	      tree def = DEF_FROM_PTR (def_p);
>> +	      if (def == name)
>> +		{
>> +		  found = true;
>> +		  break;
>> +		}
>> +	    }
>> +
>> +	  if (!found)
>> +	    {
>> +	      if (dump_file)
>> +		fprintf (dump_file, "Released dangling ssa name %u\n", i);
>> +	      release_ssa_name (name);
>> +	    }
>> +	}
>> +
>> +      return res;
>> +    }
>> +  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
>>
>>   }; // class pass_expand_omp_ssa
>>
>> @@ -13728,4 +13885,39 @@ omp_finish_file (void)
>>       }
>>   }
>>
>> +static bool
>> +gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
>> +					  unsigned int nr_varnames,
>> +					  unsigned int flags)
>
> Missing comment.
>
>> +{
>> +  tree use;
>> +  ssa_op_iter iter;
>> +  const char *s;
>> +
>> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
>> +    {
>> +      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
>> +	continue;
>> +      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
>> +
>> +      unsigned int i;
>> +      for (i = 0; i < nr_varnames; ++i)
>> +	if (strcmp (varnames[i], s) == 0)
>> +	  return true;
>
> Eh?  This surely is crap - you can't ever have semantics depend on
> identifiers.
>
>> +    }
>> +
>> +  return false;
>> +}
>> +
>> +/* Return true if STMT is .omp_data_i init.  */
>> +
>> +bool
>> +gimple_stmt_omp_data_i_init_p (gimple stmt)
>> +{
>> +  const char *varnames[] = { ".omp_data_i" };
>> +  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
>> +  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
>> +						   SSA_OP_DEF);
>
> So no - this isn't possible this way and I suspect it's not reliable
> anyway.
>

Rewritten gimple_stmt_omp_data_i_init_p to not use identifier names, 
attached and committed.

>> +}
>> +
>>   #include "gt-omp-low.h"
>> diff --git gcc/omp-low.h gcc/omp-low.h
>> index 8a4052e..3d30c3b 100644
>> --- gcc/omp-low.h
>> +++ gcc/omp-low.h
>> @@ -28,6 +28,7 @@ extern void free_omp_regions (void);
>>   extern tree omp_reduction_init (tree, tree);
>>   extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
>>   extern void omp_finish_file (void);
>> +extern bool gimple_stmt_omp_data_i_init_p (gimple);
>>
>>   extern GTY(()) vec<tree, va_gc> *offload_funcs;
>>   extern GTY(()) vec<tree, va_gc> *offload_vars;
>> diff --git gcc/passes.def gcc/passes.def
>> index 2bc5dcd..db0dd18 100644
>> --- gcc/passes.def
>> +++ gcc/passes.def
>> @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
>>   	     execute TODO_rebuild_alias at this point.  */
>>   	  NEXT_PASS (pass_build_ealias);
>>   	  NEXT_PASS (pass_fre);
>> +	  NEXT_PASS (pass_expand_omp_ssa);
>>   	  NEXT_PASS (pass_merge_phi);
>>   	  NEXT_PASS (pass_cd_dce);
>>   	  NEXT_PASS (pass_early_ipa_sra);
>> @@ -99,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>>   	      late.  */
>>   	  NEXT_PASS (pass_split_functions);
>>         POP_INSERT_PASSES ()
>> +      NEXT_PASS (pass_expand_omp_ssa);
>>         NEXT_PASS (pass_release_ssa_names);
>>         NEXT_PASS (pass_rebuild_cgraph_edges);
>>         NEXT_PASS (pass_inline_parameters);
>> diff --git gcc/tree-ssa-ccp.c gcc/tree-ssa-ccp.c
>> index d45a3ff..46fe1c7 100644
>> --- gcc/tree-ssa-ccp.c
>> +++ gcc/tree-ssa-ccp.c
>> @@ -172,6 +172,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "wide-int-print.h"
>>   #include "builtins.h"
>>   #include "tree-chkp.h"
>> +#include "omp-low.h"
>>
>>
>>   /* Possible lattice values.  */
>> @@ -796,6 +797,9 @@ surely_varying_stmt_p (gimple stmt)
>>         && gimple_code (stmt) != GIMPLE_CALL)
>>       return true;
>>
>> +  if (gimple_stmt_omp_data_i_init_p (stmt))
>> +    return true;
>> +
>
> No.
>
>>     return false;
>>   }
>>
>> @@ -2329,6 +2333,8 @@ ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
>>     switch (gimple_code (stmt))
>>       {
>>         case GIMPLE_ASSIGN:
>> +	if (gimple_stmt_omp_data_i_init_p (stmt))
>> +	  break;
>>           /* If the statement is an assignment that produces a single
>>              output value, evaluate its RHS to see if the lattice value of
>>              its output has changed.  */
>> diff --git gcc/tree-ssa-forwprop.c gcc/tree-ssa-forwprop.c
>> index d8db20a..554a5a5 100644
>> --- gcc/tree-ssa-forwprop.c
>> +++ gcc/tree-ssa-forwprop.c
>> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "tree-cfgcleanup.h"
>>   #include "tree-into-ssa.h"
>>   #include "cfganal.h"
>> +#include "omp-low.h"
>>
>>   /* This pass propagates the RHS of assignment statements into use
>>      sites of the LHS of the assignment.  It's basically a specialized
>> @@ -2155,7 +2156,8 @@ pass_forwprop::execute (function *fun)
>>   	  tree lhs, rhs;
>>   	  enum tree_code code;
>>
>> -	  if (!is_gimple_assign (stmt))
>> +	  if (!is_gimple_assign (stmt)
>> +	      || gimple_stmt_omp_data_i_init_p (stmt))
>
> No.
>
>>   	    {
>>   	      gsi_next (&gsi);
>>   	      continue;
>> diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
>> index e417a15..449a615 100644
>> --- gcc/tree-ssa-sccvn.c
>> +++ gcc/tree-ssa-sccvn.c
>> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "ipa-ref.h"
>>   #include "plugin-api.h"
>>   #include "cgraph.h"
>> +#include "omp-low.h"
>>
>>   /* This algorithm is based on the SCC algorithm presented by Keith
>>      Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
>> @@ -3542,7 +3543,8 @@ visit_use (tree use)
>>       {
>>         if (gimple_code (stmt) == GIMPLE_PHI)
>>   	changed = visit_phi (stmt);
>> -      else if (gimple_has_volatile_ops (stmt))
>> +      else if (gimple_has_volatile_ops (stmt)
>> +	       || gimple_stmt_omp_data_i_init_p (stmt))
>
> No.
>
> What is the intent of these changes?
>

These are changes to handle the kernels region conservatively, in order 
to not undo the omp-lowering before getting to the oacc-parloops pass.

Thanks,
- Tom


[-- Attachment #2: 0004-Rewrite-gimple_stmt_omp_data_i_init_p.patch --]
[-- Type: text/x-patch, Size: 2585 bytes --]

Rewrite gimple_stmt_omp_data_i_init_p

2015-06-03  Tom de Vries  <tom@codesourcery.com>

	* omp-low.c (gimple_stmt_ssa_operand_references_var_p): Remove function.
	(gimple_stmt_omp_data_i_init_p): Rewrite without
	gimple_stmt_ssa_operand_references_var_p.
---
 gcc/omp-low.c | 59 ++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f847d5c..0b31992 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15368,39 +15368,40 @@ omp_finish_file (void)
     }
 }
 
-static bool
-gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
-					  unsigned int nr_varnames,
-					  unsigned int flags)
-{
-  tree use;
-  ssa_op_iter iter;
-  const char *s;
-
-  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
-    {
-      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
-	continue;
-      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
-
-      unsigned int i;
-      for (i = 0; i < nr_varnames; ++i)
-	if (strcmp (varnames[i], s) == 0)
-	  return true;
-    }
-
-  return false;
-}
-
-/* Return true if STMT is .omp_data_i init.  */
+/* Return true if STMT is copy assignment .omp_data_i = &.omp_data_arr.  */
 
 bool
 gimple_stmt_omp_data_i_init_p (gimple stmt)
 {
-  const char *varnames[] = { ".omp_data_i" };
-  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
-  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
-						   SSA_OP_DEF);
+  /* Extract obj from stmt 'a = &obj.  */
+  if (!gimple_assign_cast_p (stmt)
+      && !gimple_assign_single_p (stmt))
+    return false;
+  tree rhs = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (rhs) != ADDR_EXPR)
+    return false;
+  tree obj = TREE_OPERAND (rhs, 0);
+
+  /* Check that the last statement in the preceding bb is an oacc kernels
+     stmt.  */
+  basic_block bb = gimple_bb (stmt);
+  if (!single_pred_p (bb))
+    return false;
+  gimple last = last_stmt (single_pred (bb));
+  if (last == NULL
+      || gimple_code (last) != GIMPLE_OMP_TARGET)
+    return false;
+  gomp_target *kernels = as_a <gomp_target *> (last);
+  if (gimple_omp_target_kind (kernels)
+      != GF_OMP_TARGET_KIND_OACC_KERNELS)
+    return false;
+
+  /* Get omp_data_arr from the oacc kernels stmt.  */
+  tree data_arg = gimple_omp_target_data_arg (kernels);
+  tree omp_data_arr = TREE_VEC_ELT (data_arg, 0);
+
+  /* If obj is omp_data_arr, we've found the .omp_data_i init statement.  */
+  return operand_equal_p (obj, omp_data_arr, 0);
 }
 
 namespace {
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Expand oacc kernels after pass_fre
  2015-06-04 16:50           ` Expand oacc kernels after pass_fre Tom de Vries
@ 2015-06-08  7:29             ` Richard Biener
  2015-06-19  9:04               ` Tom de Vries
  2015-08-05  7:24             ` [committed, gomp4] Fix release_dangling_ssa_names Tom de Vries
  1 sibling, 1 reply; 71+ messages in thread
From: Richard Biener @ 2015-06-08  7:29 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Thu, 4 Jun 2015, Tom de Vries wrote:

> > >   	    {
> > >   	      gsi_next (&gsi);
> > >   	      continue;
> > > diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
> > > index e417a15..449a615 100644
> > > --- gcc/tree-ssa-sccvn.c
> > > +++ gcc/tree-ssa-sccvn.c
> > > @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
> > >   #include "ipa-ref.h"
> > >   #include "plugin-api.h"
> > >   #include "cgraph.h"
> > > +#include "omp-low.h"
> > > 
> > >   /* This algorithm is based on the SCC algorithm presented by Keith
> > >      Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
> > > @@ -3542,7 +3543,8 @@ visit_use (tree use)
> > >       {
> > >         if (gimple_code (stmt) == GIMPLE_PHI)
> > >   	changed = visit_phi (stmt);
> > > -      else if (gimple_has_volatile_ops (stmt))
> > > +      else if (gimple_has_volatile_ops (stmt)
> > > +	       || gimple_stmt_omp_data_i_init_p (stmt))
> > 
> > No.
> > 
> > What is the intent of these changes?
> > 
> 
> These are changes to handle the kernels region conservatively, in order to not
> undo the omp-lowering before getting to the oacc-parloops pass.

Still it feels too much like the MPX mistake (maintainance cost and
compile-time cost).  How can any pass "undo" omp-lowering?

Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Expand oacc kernels after pass_fre
  2015-06-08  7:29             ` Richard Biener
@ 2015-06-19  9:04               ` Tom de Vries
  0 siblings, 0 replies; 71+ messages in thread
From: Tom de Vries @ 2015-06-19  9:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On 08/06/15 09:25, Richard Biener wrote:
> On Thu, 4 Jun 2015, Tom de Vries wrote:
>
>>>>    	    {
>>>>    	      gsi_next (&gsi);
>>>>    	      continue;
>>>> diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
>>>> index e417a15..449a615 100644
>>>> --- gcc/tree-ssa-sccvn.c
>>>> +++ gcc/tree-ssa-sccvn.c
>>>> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>    #include "ipa-ref.h"
>>>>    #include "plugin-api.h"
>>>>    #include "cgraph.h"
>>>> +#include "omp-low.h"
>>>>
>>>>    /* This algorithm is based on the SCC algorithm presented by Keith
>>>>       Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
>>>> @@ -3542,7 +3543,8 @@ visit_use (tree use)
>>>>        {
>>>>          if (gimple_code (stmt) == GIMPLE_PHI)
>>>>    	changed = visit_phi (stmt);
>>>> -      else if (gimple_has_volatile_ops (stmt))
>>>> +      else if (gimple_has_volatile_ops (stmt)
>>>> +	       || gimple_stmt_omp_data_i_init_p (stmt))
>>>
>>> No.
>>>
>>> What is the intent of these changes?
>>>
>>
>> These are changes to handle the kernels region conservatively, in order to not
>> undo the omp-lowering before getting to the oacc-parloops pass.
>
> Still it feels too much like the MPX mistake (maintainance cost and
> compile-time cost).  How can any pass "undo" omp-lowering?
>

I'm talking about the rewriting of the variables in terms of 
.omp_data_i. Passes like copy_prop and fre undo this rewriting, and 
propagate the variables from outside the kernels region back into the 
kernels region, eliminating .omp_data_i.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [committed, gomp4] Fix release_dangling_ssa_names
  2015-06-04 16:50           ` Expand oacc kernels after pass_fre Tom de Vries
  2015-06-08  7:29             ` Richard Biener
@ 2015-08-05  7:24             ` Tom de Vries
  2015-08-05  7:29               ` Richard Biener
  1 sibling, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-08-05  7:24 UTC (permalink / raw)
  To: Richard Biener, Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3026 bytes --]

[ was: Re: Expand oacc kernels after pass_fre ]
On 04/06/15 18:02, Tom de Vries wrote:
>> Please move this out of the class body.
>>
>
> Fixed and committed (ommitting patch as trivial).
>
>>> +    {
>>> +      unsigned res = execute_expand_omp ();
>>> +
>>> +      /* After running pass_expand_omp_ssa to expand the oacc kernels
>>> +     directive, we are left in the original function with anonymous
>>> +     SSA_NAMEs, with a defining statement that has been deleted.  This
>>> +     pass finds those SSA_NAMEs and releases them.
>>> +     TODO: Either fix this elsewhere, or make the fix unnecessary.  */
>>> +      unsigned int i;
>>> +      for (i = 1; i < num_ssa_names; ++i)
>>> +    {
>>> +      tree name = ssa_name (i);
>>> +      if (name == NULL_TREE)
>>> +        continue;
>>> +
>>> +      gimple stmt = SSA_NAME_DEF_STMT (name);
>>> +      bool found = false;
>>> +
>>> +      ssa_op_iter op_iter;
>>> +      def_operand_p def_p;
>>> +      FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
>>> +        {
>>> +          tree def = DEF_FROM_PTR (def_p);
>>> +          if (def == name)
>>> +        {
>>> +          found = true;
>>> +          break;
>>> +        }
>>> +        }
>>> +
>>> +      if (!found)
>>> +        {
>>> +          if (dump_file)
>>> +        fprintf (dump_file, "Released dangling ssa name %u\n", i);
>>> +          release_ssa_name (name);
>>> +        }
>>> +    }
>>> +
>>> +      return res;
>>> +    }

This patch implements the TODO.

The cause of the problems is that in replace_ssa_name, we create a new 
ssa_name with the def stmt of the old ssa_name, but do not reset the def 
stmt of the old ssa_name, leaving the ssa_name in the original function 
having a def stmt in the split-off function.

[ And if we don't do anything about that, at some point in another pass 
we use 'gimple_bb (SSA_NAME_DEF_STMT (name))->index' (a bb index in the 
split-off function) as an index into an array with as length the number 
of bbs in the original function. So the index may be out of bounds. ]

This patch fixes that by making sure we reset the def stmt to NULL. This 
means we can simplify release_dangling_ssa_names to just test for NULL 
def stmts.

Default defs are skipped by release_ssa_name, so setting the def stmt 
for default defs to NULL does not result in the name being released, but 
in an ssa-verification error. So instead, we keep the def stmt nop, and 
create a new nop for the copy in the split-off function.

[ The default def bit seems only to be triggered for the default def 
created by expand_omp_target:
...
   /* If we are in ssa form, we must load the value from the default
      definition of the argument.  That should not be defined now,
      since the argument is not used uninitialized.  */
   gcc_assert (ssa_default_def (cfun, arg) == NULL);
   narg = make_ssa_name (arg, gimple_build_nop ());
   set_ssa_default_def (cfun, arg, narg);
...
]

Bootstrapped and reg-tested on x86_64.

Committed to gomp-4_0-branch.

Thanks,
- Tom


[-- Attachment #2: 0001-Fix-release_dangling_ssa_names.patch --]
[-- Type: text/x-patch, Size: 3615 bytes --]

Fix release_dangling_ssa_names

2015-08-05  Tom de Vries  <tom@codesourcery.com>

	* omp-low.c (release_dangling_ssa_names): Release SSA_NAMEs with NULL
	def stmt.
	* tree-cfg.c (replace_ssa_name): Don't move default def nops.  Set def
	stmt of unused SSA_NAME to NULL.
---
 gcc/omp-low.c  | 35 +++++++++++------------------------
 gcc/tree-cfg.c | 17 ++++++++++++++++-
 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 0ebbbe1..cd2076f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10349,11 +10349,10 @@ make_pass_expand_omp (gcc::context *ctxt)
   return new pass_expand_omp (ctxt);
 }
 
-/* After running pass_expand_omp_ssa to expand the oacc kernels
-   directive, we are left in the original function with anonymous
-   SSA_NAMEs, with a defining statement that has been deleted.  This
-   pass finds those SSA_NAMEs and releases them.
-   TODO: Either fix this elsewhere, or make the fix unnecessary.  */
+/* After running pass_expand_omp_ssa to expand the oacc kernels directive, we
+   are left in the original function with anonymous SSA_NAMEs, with a NULL
+   defining statement.  This function finds those SSA_NAMEs and releases
+   them.  */
 
 static void
 release_dangling_ssa_names (void)
@@ -10366,26 +10365,14 @@ release_dangling_ssa_names (void)
 	continue;
 
       gimple stmt = SSA_NAME_DEF_STMT (name);
-      bool found = false;
-
-      ssa_op_iter op_iter;
-      def_operand_p def_p;
-      FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
-	{
-	  tree def = DEF_FROM_PTR (def_p);
-	  if (def == name)
-	    {
-	      found = true;
-	      break;
-	    }
-	}
+      if (stmt != NULL)
+	continue;
 
-      if (!found)
-	{
-	  if (dump_file)
-	    fprintf (dump_file, "Released dangling ssa name %u\n", i);
-	  release_ssa_name (name);
-	}
+      release_ssa_name (name);
+      gcc_assert (SSA_NAME_IN_FREE_LIST (name));
+      if (dump_file
+	  && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Released dangling ssa name %u\n", i);
     }
 }
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index cb9fe6d..6a00b25 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6467,8 +6467,17 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
       if (decl)
 	{
 	  replace_by_duplicate_decl (&decl, vars_map, to_context);
+	  /* If name is a default def, then we don't move the defining stmt
+	     (which is a nop).  Because (1) the nop doesn't belong to the sese
+	     region, and (2) while setting the def stmt of name to NULL would
+	     trigger release_ssa_name in release_dangling_ssa_names, it wouldn't
+	     be released since it's a default def, and subsequently cause an
+	     ssa verification failure.  */
+	  gimple def_stmt = (SSA_NAME_IS_DEFAULT_DEF (name)
+			     ? gimple_build_nop ()
+			     : SSA_NAME_DEF_STMT (name));
 	  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
-				       decl, SSA_NAME_DEF_STMT (name));
+				       decl, def_stmt);
 	  if (SSA_NAME_IS_DEFAULT_DEF (name))
 	    set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
 				 decl, new_name);
@@ -6478,6 +6487,12 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
 				     name, SSA_NAME_DEF_STMT (name));
 
       vars_map->put (name, new_name);
+
+      if (!SSA_NAME_IS_DEFAULT_DEF (name))
+	/* The statement has been moved to the child function.  It no longer
+	   defines name in the original function.  Mark the def stmt NULL, and
+	   let release_dangling_ssa_names deal with it.  */
+	SSA_NAME_DEF_STMT (name) = NULL;
     }
   else
     new_name = *loc;
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [committed, gomp4] Fix release_dangling_ssa_names
  2015-08-05  7:24             ` [committed, gomp4] Fix release_dangling_ssa_names Tom de Vries
@ 2015-08-05  7:29               ` Richard Biener
  2015-08-05  8:48                 ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2015-08-05  7:29 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Wed, 5 Aug 2015, Tom de Vries wrote:

> [ was: Re: Expand oacc kernels after pass_fre ]
> On 04/06/15 18:02, Tom de Vries wrote:
> > > Please move this out of the class body.
> > > 
> > 
> > Fixed and committed (ommitting patch as trivial).
> > 
> > > > +    {
> > > > +      unsigned res = execute_expand_omp ();
> > > > +
> > > > +      /* After running pass_expand_omp_ssa to expand the oacc kernels
> > > > +     directive, we are left in the original function with anonymous
> > > > +     SSA_NAMEs, with a defining statement that has been deleted.  This
> > > > +     pass finds those SSA_NAMEs and releases them.
> > > > +     TODO: Either fix this elsewhere, or make the fix unnecessary.  */
> > > > +      unsigned int i;
> > > > +      for (i = 1; i < num_ssa_names; ++i)
> > > > +    {
> > > > +      tree name = ssa_name (i);
> > > > +      if (name == NULL_TREE)
> > > > +        continue;
> > > > +
> > > > +      gimple stmt = SSA_NAME_DEF_STMT (name);
> > > > +      bool found = false;
> > > > +
> > > > +      ssa_op_iter op_iter;
> > > > +      def_operand_p def_p;
> > > > +      FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
> > > > +        {
> > > > +          tree def = DEF_FROM_PTR (def_p);
> > > > +          if (def == name)
> > > > +        {
> > > > +          found = true;
> > > > +          break;
> > > > +        }
> > > > +        }
> > > > +
> > > > +      if (!found)
> > > > +        {
> > > > +          if (dump_file)
> > > > +        fprintf (dump_file, "Released dangling ssa name %u\n", i);
> > > > +          release_ssa_name (name);
> > > > +        }
> > > > +    }
> > > > +
> > > > +      return res;
> > > > +    }
> 
> This patch implements the TODO.
> 
> The cause of the problems is that in replace_ssa_name, we create a new
> ssa_name with the def stmt of the old ssa_name, but do not reset the def stmt
> of the old ssa_name, leaving the ssa_name in the original function having a
> def stmt in the split-off function.
> 
> [ And if we don't do anything about that, at some point in another pass we use
> 'gimple_bb (SSA_NAME_DEF_STMT (name))->index' (a bb index in the split-off
> function) as an index into an array with as length the number of bbs in the
> original function. So the index may be out of bounds. ]
> 
> This patch fixes that by making sure we reset the def stmt to NULL. This means
> we can simplify release_dangling_ssa_names to just test for NULL def stmts.

Not sure if I understand the problem correctly but why are you not simply
releasing the SSA name when you remove its definition?

Richard.

> Default defs are skipped by release_ssa_name, so setting the def stmt for
> default defs to NULL does not result in the name being released, but in an
> ssa-verification error. So instead, we keep the def stmt nop, and create a new
> nop for the copy in the split-off function.
> 
> [ The default def bit seems only to be triggered for the default def created
> by expand_omp_target:
> ...
>   /* If we are in ssa form, we must load the value from the default
>      definition of the argument.  That should not be defined now,
>      since the argument is not used uninitialized.  */
>   gcc_assert (ssa_default_def (cfun, arg) == NULL);
>   narg = make_ssa_name (arg, gimple_build_nop ());
>   set_ssa_default_def (cfun, arg, narg);
> ...
> ]
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Committed to gomp-4_0-branch.
> 
> Thanks,
> - Tom
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [committed, gomp4] Fix release_dangling_ssa_names
  2015-08-05  7:29               ` Richard Biener
@ 2015-08-05  8:48                 ` Tom de Vries
  2015-08-05  9:30                   ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-08-05  8:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On 05/08/15 09:29, Richard Biener wrote:
>> This patch fixes that by making sure we reset the def stmt to NULL. This means
>> >we can simplify release_dangling_ssa_names to just test for NULL def stmts.
> Not sure if I understand the problem correctly but why are you not simply
> releasing the SSA name when you remove its definition?

In move_sese_region_to_fn we move a region of blocks from one function 
to another, bit by bit.

When we encounter an ssa_name as def or use in the region, we:
- generate a new ssa_name,
- set the def stmt of the old name as def stmt of the new name, and
- add a mapping from the old to the new name.
The next time we encounter the same ssa_name in another statement, we 
find it in the map.

If we release the old ssa name, we effectively create statements with 
operands in the free-list. The first point where that cause breakage, is 
in walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign 
to be defined, which is not the case if it's in the free-list:
...
case GIMPLE_ASSIGN:
   /* Walk the RHS operands.  If the LHS is of a non-renamable type or
      is a register variable, we may use a COMPONENT_REF on the RHS.*/
   if (wi)
     {
       tree lhs = gimple_assign_lhs (stmt);
       wi->val_only
         = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
            || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
     }
...

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [committed, gomp4] Fix release_dangling_ssa_names
  2015-08-05  8:48                 ` Tom de Vries
@ 2015-08-05  9:30                   ` Richard Biener
  2015-08-05 10:49                     ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Richard Biener @ 2015-08-05  9:30 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Wed, 5 Aug 2015, Tom de Vries wrote:

> On 05/08/15 09:29, Richard Biener wrote:
> > > This patch fixes that by making sure we reset the def stmt to NULL. This
> > > means
> > > >we can simplify release_dangling_ssa_names to just test for NULL def
> > > stmts.
> > Not sure if I understand the problem correctly but why are you not simply
> > releasing the SSA name when you remove its definition?
> 
> In move_sese_region_to_fn we move a region of blocks from one function to
> another, bit by bit.
> 
> When we encounter an ssa_name as def or use in the region, we:
> - generate a new ssa_name,
> - set the def stmt of the old name as def stmt of the new name, and
> - add a mapping from the old to the new name.
> The next time we encounter the same ssa_name in another statement, we find it
> in the map.
> 
> If we release the old ssa name, we effectively create statements with operands
> in the free-list. The first point where that cause breakage, is in
> walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be
> defined, which is not the case if it's in the free-list:
> ...
> case GIMPLE_ASSIGN:
>   /* Walk the RHS operands.  If the LHS is of a non-renamable type or
>      is a register variable, we may use a COMPONENT_REF on the RHS.*/
>   if (wi)
>     {
>       tree lhs = gimple_assign_lhs (stmt);
>       wi->val_only
>         = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
>            || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
>     }
> ...

Hmm, ok, probably because the stmt moving doesn't happen in DOM
order (move defs before uses).  But

+
+      if (!SSA_NAME_IS_DEFAULT_DEF (name))
+       /* The statement has been moved to the child function.  It no 
longer
+          defines name in the original function.  Mark the def stmt NULL, 
and
+          let release_dangling_ssa_names deal with it.  */
+       SSA_NAME_DEF_STMT (name) = NULL;

applies also to uses - I don't see why it couldn't happen that you
move a use but not its def (the def would be a parameter to the
split-out function).  You'd wreck the IL of the source function this way.

I think that the whole dance of actually moving things instead of
just copying it isn't worth the extra maintainance (well, if we already
have a machinery duplicating a SESE region to another function - I
suppose gimple_duplicate_sese_region could be trivially changed to
support that).

Trunk doesn't have release_dangling_ssa_names it seems but I think
it belongs to move_sese_region_to_fn and not to omp-low.c and it
could also just walk the d->vars_map replace_ssa_name fills to
iterate over the removal candidates (and if the situation of
moving uses but not defs cannot happen you don't need any
SSA_NAME_DEF_STMT dance either).

Thanks,
Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [committed, gomp4] Fix release_dangling_ssa_names
  2015-08-05  9:30                   ` Richard Biener
@ 2015-08-05 10:49                     ` Tom de Vries
  2015-08-05 11:13                       ` Richard Biener
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-08-05 10:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On 05/08/15 11:30, Richard Biener wrote:
> On Wed, 5 Aug 2015, Tom de Vries wrote:
>
>> On 05/08/15 09:29, Richard Biener wrote:
>>>> This patch fixes that by making sure we reset the def stmt to NULL. This
>>>> means
>>>>> we can simplify release_dangling_ssa_names to just test for NULL def
>>>> stmts.
>>> Not sure if I understand the problem correctly but why are you not simply
>>> releasing the SSA name when you remove its definition?
>>
>> In move_sese_region_to_fn we move a region of blocks from one function to
>> another, bit by bit.
>>
>> When we encounter an ssa_name as def or use in the region, we:
>> - generate a new ssa_name,
>> - set the def stmt of the old name as def stmt of the new name, and
>> - add a mapping from the old to the new name.
>> The next time we encounter the same ssa_name in another statement, we find it
>> in the map.
>>
>> If we release the old ssa name, we effectively create statements with operands
>> in the free-list. The first point where that cause breakage, is in
>> walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be
>> defined, which is not the case if it's in the free-list:
>> ...
>> case GIMPLE_ASSIGN:
>>    /* Walk the RHS operands.  If the LHS is of a non-renamable type or
>>       is a register variable, we may use a COMPONENT_REF on the RHS.*/
>>    if (wi)
>>      {
>>        tree lhs = gimple_assign_lhs (stmt);
>>        wi->val_only
>>          = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
>>             || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
>>      }
>> ...
>
> Hmm, ok, probably because the stmt moving doesn't happen in DOM
> order (move defs before uses).  But
>

There seems to be similar code for the rhs, so I don't think changing 
the order would fix anything.

> +
> +      if (!SSA_NAME_IS_DEFAULT_DEF (name))
> +       /* The statement has been moved to the child function.  It no
> longer
> +          defines name in the original function.  Mark the def stmt NULL,
> and
> +          let release_dangling_ssa_names deal with it.  */
> +       SSA_NAME_DEF_STMT (name) = NULL;
>
> applies also to uses - I don't see why it couldn't happen that you
> move a use but not its def (the def would be a parameter to the
> split-out function).  You'd wreck the IL of the source function this way.
>

If you first move a use, you create a mapping. When you encounter the 
def, you use the mapping. Indeed, if the def is a default def, we don't 
encounter the def. Which is why we create a nop as defining def for 
those cases. The default def in the source function still has a defining 
nop, and has no uses anymore. I don't understand what is broken here.

> I think that the whole dance of actually moving things instead of
> just copying it isn't worth the extra maintainance (well, if we already
> have a machinery duplicating a SESE region to another function - I
> suppose gimple_duplicate_sese_region could be trivially changed to
> support that).
>

I'll mention that as todo. For now, I think the fastest way to get a 
working version is to fix move_sese_region_to_fn.

>Trunk doesn't have release_dangling_ssa_names it seems

Yep, I only ran into this trouble for the kernels region handling. But I 
don't exclude the possibility it could happen for trunk as well.

> but I think
> it belongs to move_sese_region_to_fn and not to omp-low.c

Makes sense indeed.

> and it
> could also just walk the d->vars_map replace_ssa_name fills to
> iterate over the removal candidates

Agreed, I suppose in general that's a win over iterating over all the 
ssa names.

> (and if the situation of
> moving uses but not defs cannot happen you don't need any
> SSA_NAME_DEF_STMT dance either).

I'd prefer to keep the SSA_NAME_DEF_STMT () = NULL bit. It makes sure a 
stmt is the defining stmt of only one ssa-name at all times.

I'll prepare a patch for trunk then.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [committed, gomp4] Fix release_dangling_ssa_names
  2015-08-05 10:49                     ` Tom de Vries
@ 2015-08-05 11:13                       ` Richard Biener
  2015-08-11  9:25                         ` [committed] Add todo comment for move_sese_region_to_fn Tom de Vries
  2015-08-11 18:53                         ` [PATCH] Don't create superfluous parm in expand_omp_taskreg Tom de Vries
  0 siblings, 2 replies; 71+ messages in thread
From: Richard Biener @ 2015-08-05 11:13 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Wed, 5 Aug 2015, Tom de Vries wrote:

> On 05/08/15 11:30, Richard Biener wrote:
> > On Wed, 5 Aug 2015, Tom de Vries wrote:
> > 
> > > On 05/08/15 09:29, Richard Biener wrote:
> > > > > This patch fixes that by making sure we reset the def stmt to NULL.
> > > > > This
> > > > > means
> > > > > > we can simplify release_dangling_ssa_names to just test for NULL def
> > > > > stmts.
> > > > Not sure if I understand the problem correctly but why are you not
> > > > simply
> > > > releasing the SSA name when you remove its definition?
> > > 
> > > In move_sese_region_to_fn we move a region of blocks from one function to
> > > another, bit by bit.
> > > 
> > > When we encounter an ssa_name as def or use in the region, we:
> > > - generate a new ssa_name,
> > > - set the def stmt of the old name as def stmt of the new name, and
> > > - add a mapping from the old to the new name.
> > > The next time we encounter the same ssa_name in another statement, we find
> > > it
> > > in the map.
> > > 
> > > If we release the old ssa name, we effectively create statements with
> > > operands
> > > in the free-list. The first point where that cause breakage, is in
> > > walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be
> > > defined, which is not the case if it's in the free-list:
> > > ...
> > > case GIMPLE_ASSIGN:
> > >    /* Walk the RHS operands.  If the LHS is of a non-renamable type or
> > >       is a register variable, we may use a COMPONENT_REF on the RHS.*/
> > >    if (wi)
> > >      {
> > >        tree lhs = gimple_assign_lhs (stmt);
> > >        wi->val_only
> > >          = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
> > >             || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
> > >      }
> > > ...
> > 
> > Hmm, ok, probably because the stmt moving doesn't happen in DOM
> > order (move defs before uses).  But
> > 
> 
> There seems to be similar code for the rhs, so I don't think changing the
> order would fix anything.
> 
> > +
> > +      if (!SSA_NAME_IS_DEFAULT_DEF (name))
> > +       /* The statement has been moved to the child function.  It no
> > longer
> > +          defines name in the original function.  Mark the def stmt NULL,
> > and
> > +          let release_dangling_ssa_names deal with it.  */
> > +       SSA_NAME_DEF_STMT (name) = NULL;
> > 
> > applies also to uses - I don't see why it couldn't happen that you
> > move a use but not its def (the def would be a parameter to the
> > split-out function).  You'd wreck the IL of the source function this way.
> > 
> 
> If you first move a use, you create a mapping. When you encounter the def, you
> use the mapping. Indeed, if the def is a default def, we don't encounter the
> def. Which is why we create a nop as defining def for those cases. The default
> def in the source function still has a defining nop, and has no uses anymore.
> I don't understand what is broken here.

If you never encounter the DEF then it's broken.  Say, if for

foo(int a)
{
  int b = a;
  if (b)
    {
      < code using b >
    }
}

you move < code using b > to a function.  Then the def is still in 
foo but you create a mapping for its use(s).  Clearly the outlining
process in this case has to pass b as parameter to the outlined
function, something that may not happen currently.

It would probably be cleaner to separate the def and use remapping
to separate functions and record on whether we saw a def or not.

> > I think that the whole dance of actually moving things instead of
> > just copying it isn't worth the extra maintainance (well, if we already
> > have a machinery duplicating a SESE region to another function - I
> > suppose gimple_duplicate_sese_region could be trivially changed to
> > support that).
> > 
> 
> I'll mention that as todo. For now, I think the fastest way to get a working
> version is to fix move_sese_region_to_fn.

Sure.

> > Trunk doesn't have release_dangling_ssa_names it seems
> 
> Yep, I only ran into this trouble for the kernels region handling. But I don't
> exclude the possibility it could happen for trunk as well.
> 
> > but I think
> > it belongs to move_sese_region_to_fn and not to omp-low.c
> 
> Makes sense indeed.
> 
> > and it
> > could also just walk the d->vars_map replace_ssa_name fills to
> > iterate over the removal candidates
> 
> Agreed, I suppose in general that's a win over iterating over all the ssa
> names.
> 
> > (and if the situation of
> > moving uses but not defs cannot happen you don't need any
> > SSA_NAME_DEF_STMT dance either).
> 
> I'd prefer to keep the SSA_NAME_DEF_STMT () = NULL bit. It makes sure a stmt
> is the defining stmt of only one ssa-name at all times.
> 
> I'll prepare a patch for trunk then.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [committed] Add todo comment for move_sese_region_to_fn
  2015-08-05 11:13                       ` Richard Biener
@ 2015-08-11  9:25                         ` Tom de Vries
  2015-08-11 18:53                         ` [PATCH] Don't create superfluous parm in expand_omp_taskreg Tom de Vries
  1 sibling, 0 replies; 71+ messages in thread
From: Tom de Vries @ 2015-08-11  9:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

[ was: Re: [committed, gomp4] Fix release_dangling_ssa_names ]
On 05/08/15 13:13, Richard Biener wrote:
>>> I think that the whole dance of actually moving things instead of
>>> > >just copying it isn't worth the extra maintainance (well, if we already
>>> > >have a machinery duplicating a SESE region to another function - I
>>> > >suppose gimple_duplicate_sese_region could be trivially changed to
>>> > >support that).
>>> > >
>> >
>> >I'll mention that as todo. For now, I think the fastest way to get a working
>> >version is to fix move_sese_region_to_fn.
> Sure.
>

This patch adds the todo discussed above in the function header comment 
of move_sese_region_to_fn.

Committed as obvious.

Thanks,
- Tom

[-- Attachment #2: 0001-Add-todo-comment-for-move_sese_region_to_fn.patch --]
[-- Type: text/x-patch, Size: 923 bytes --]

Add todo comment for move_sese_region_to_fn

2015-08-11  Tom de Vries  <tom@codesourcery.com>

	* tree-cfg.c (move_sese_region_to_fn): Add todo comment.
---
 gcc/tree-cfg.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index e26454a..588ab69 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -7011,7 +7011,11 @@ verify_sese (basic_block entry, basic_block exit, vec<basic_block> *bbs_p)
 
    All local variables referenced in the region are assumed to be in
    the corresponding BLOCK_VARS and unexpanded variable lists
-   associated with DEST_CFUN.  */
+   associated with DEST_CFUN.
+
+   TODO: investigate whether we can reuse gimple_duplicate_sese_region to
+   reimplement move_sese_region_to_fn by duplicating the region rather than
+   moving it.  */
 
 basic_block
 move_sese_region_to_fn (struct function *dest_cfun, basic_block entry_bb,
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH] Don't create superfluous parm in expand_omp_taskreg
  2015-08-05 11:13                       ` Richard Biener
  2015-08-11  9:25                         ` [committed] Add todo comment for move_sese_region_to_fn Tom de Vries
@ 2015-08-11 18:53                         ` Tom de Vries
  2015-08-12 10:51                           ` Richard Biener
  2015-09-24  6:36                           ` Thomas Schwinge
  1 sibling, 2 replies; 71+ messages in thread
From: Tom de Vries @ 2015-08-11 18:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 5496 bytes --]

[ was: Re: [committed, gomp4] Fix release_dangling_ssa_names ]

On 05/08/15 13:13, Richard Biener wrote:
> On Wed, 5 Aug 2015, Tom de Vries wrote:
>
>> On 05/08/15 11:30, Richard Biener wrote:
>>> On Wed, 5 Aug 2015, Tom de Vries wrote:
>>>
>>>> On 05/08/15 09:29, Richard Biener wrote:
>>>>>> This patch fixes that by making sure we reset the def stmt to NULL.
>>>>>> This
>>>>>> means
>>>>>>> we can simplify release_dangling_ssa_names to just test for NULL def
>>>>>> stmts.
>>>>> Not sure if I understand the problem correctly but why are you not
>>>>> simply
>>>>> releasing the SSA name when you remove its definition?
>>>>
>>>> In move_sese_region_to_fn we move a region of blocks from one function to
>>>> another, bit by bit.
>>>>
>>>> When we encounter an ssa_name as def or use in the region, we:
>>>> - generate a new ssa_name,
>>>> - set the def stmt of the old name as def stmt of the new name, and
>>>> - add a mapping from the old to the new name.
>>>> The next time we encounter the same ssa_name in another statement, we find
>>>> it
>>>> in the map.
>>>>
>>>> If we release the old ssa name, we effectively create statements with
>>>> operands
>>>> in the free-list. The first point where that cause breakage, is in
>>>> walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be
>>>> defined, which is not the case if it's in the free-list:
>>>> ...
>>>> case GIMPLE_ASSIGN:
>>>>     /* Walk the RHS operands.  If the LHS is of a non-renamable type or
>>>>        is a register variable, we may use a COMPONENT_REF on the RHS.*/
>>>>     if (wi)
>>>>       {
>>>>         tree lhs = gimple_assign_lhs (stmt);
>>>>         wi->val_only
>>>>           = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
>>>>              || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
>>>>       }
>>>> ...
>>>
>>> Hmm, ok, probably because the stmt moving doesn't happen in DOM
>>> order (move defs before uses).  But
>>>
>>
>> There seems to be similar code for the rhs, so I don't think changing the
>> order would fix anything.
>>
>>> +
>>> +      if (!SSA_NAME_IS_DEFAULT_DEF (name))
>>> +       /* The statement has been moved to the child function.  It no
>>> longer
>>> +          defines name in the original function.  Mark the def stmt NULL,
>>> and
>>> +          let release_dangling_ssa_names deal with it.  */
>>> +       SSA_NAME_DEF_STMT (name) = NULL;
>>>
>>> applies also to uses - I don't see why it couldn't happen that you
>>> move a use but not its def (the def would be a parameter to the
>>> split-out function).  You'd wreck the IL of the source function this way.
>>>
>>
>> If you first move a use, you create a mapping. When you encounter the def, you
>> use the mapping. Indeed, if the def is a default def, we don't encounter the
>> def. Which is why we create a nop as defining def for those cases. The default
>> def in the source function still has a defining nop, and has no uses anymore.
>> I don't understand what is broken here.
>
> If you never encounter the DEF then it's broken.  Say, if for
>
> foo(int a)
> {
>    int b = a;
>    if (b)
>      {
>        < code using b >
>      }
> }
>
> you move < code using b > to a function.  Then the def is still in
> foo but you create a mapping for its use(s).  Clearly the outlining
> process in this case has to pass b as parameter to the outlined
> function, something that may not happen currently.
>

Ah, I see. Indeed, this is a situation that is assumed not to happen.

> It would probably be cleaner to separate the def and use remapping
> to separate functions and record on whether we saw a def or not.
>

Right, or some other means to detect this situation, say when copying 
the def stmt in replace_ssa_name, check whether it's part of the sese 
region.

>>> I think that the whole dance of actually moving things instead of
>>> just copying it isn't worth the extra maintainance (well, if we already
>>> have a machinery duplicating a SESE region to another function - I
>>> suppose gimple_duplicate_sese_region could be trivially changed to
>>> support that).
>>>
>>
>> I'll mention that as todo. For now, I think the fastest way to get a working
>> version is to fix move_sese_region_to_fn.
>
> Sure.
>
>>> Trunk doesn't have release_dangling_ssa_names it seems
>>
>> Yep, I only ran into this trouble for the kernels region handling. But I don't
>> exclude the possibility it could happen for trunk as well.
>>
>>> but I think
>>> it belongs to move_sese_region_to_fn and not to omp-low.c
>>
>> Makes sense indeed.
>>
>>> and it
>>> could also just walk the d->vars_map replace_ssa_name fills to
>>> iterate over the removal candidates
>>
>> Agreed, I suppose in general that's a win over iterating over all the ssa
>> names.
>>
>>> (and if the situation of
>>> moving uses but not defs cannot happen you don't need any
>>> SSA_NAME_DEF_STMT dance either).
>>
>> I'd prefer to keep the SSA_NAME_DEF_STMT () = NULL bit. It makes sure a stmt
>> is the defining stmt of only one ssa-name at all times.
>>
>> I'll prepare a patch for trunk then.
>

This patch fixes two problems with expand_omp_taskreg:
- it makes sure we don't generate a dummy default def in the original
   function (which we cannot get rid of afterwards, given that it's a
   default def).
- it releases ssa-names in the original function that have defining
   statements that have been moved to the split-off function.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

[-- Attachment #2: 0001-Don-t-create-superfluous-parm-in-expand_omp_taskreg.patch --]
[-- Type: text/x-patch, Size: 6390 bytes --]

Don't create superfluous parm in expand_omp_taskreg

2015-08-11  Tom de Vries  <tom@codesourcery.com>

	* omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy stmt to
	parm_decl, rather than generating a dummy default def in cfun.
	* tree-cfg.c (replace_ssa_name): Assume no default defs.  Make sure
	ssa_name from cfun and child_fn do not share a stmt as def stmt.
	(move_stmt_op): Handle PARM_DECl.
	(gather_ssa_name_hash_map_from): New function.
	(move_sese_region_to_fn): Add default defs for function params, and add
	them to vars_map.  Release copied ssa names.
	* tree-cfg.h (gather_ssa_name_hash_map_from): Declare.
---
 gcc/omp-low.c  | 20 ++++++++++----------
 gcc/tree-cfg.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
 gcc/tree-cfg.h |  1 +
 3 files changed, 53 insertions(+), 13 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c1dc919..6f32a4a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5417,7 +5417,7 @@ expand_omp_taskreg (struct omp_region *region)
 	  basic_block entry_succ_bb
 	    = single_succ_p (entry_bb) ? single_succ (entry_bb)
 				       : FALLTHRU_EDGE (entry_bb)->dest;
-	  tree arg, narg;
+	  tree arg;
 	  gimple parcopy_stmt = NULL;
 
 	  for (gsi = gsi_start_bb (entry_succ_bb); ; gsi_next (&gsi))
@@ -5462,15 +5462,15 @@ expand_omp_taskreg (struct omp_region *region)
 	    }
 	  else
 	    {
-	      /* If we are in ssa form, we must load the value from the default
-		 definition of the argument.  That should not be defined now,
-		 since the argument is not used uninitialized.  */
-	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
-	      narg = make_ssa_name (arg, gimple_build_nop ());
-	      set_ssa_default_def (cfun, arg, narg);
-	      /* ?? Is setting the subcode really necessary ??  */
-	      gimple_omp_set_subcode (parcopy_stmt, TREE_CODE (narg));
-	      gimple_assign_set_rhs1 (parcopy_stmt, narg);
+	      tree lhs = gimple_assign_lhs (parcopy_stmt);
+	      gcc_assert (SSA_NAME_VAR (lhs) == arg);
+	      /* We'd like to set the rhs to the default def in the child_fn,
+		 but it's too early to create ssa names in the child_fn.
+		 Instead, we set the rhs to the parm.  In
+		 move_sese_region_to_fn, we introduce a default def for the
+		 parm, map the parm to it's default def, and once we encounter
+		 this stmt, replace the parm with the default def.  */
+	      gimple_assign_set_rhs1 (parcopy_stmt, arg);
 	      update_stmt (parcopy_stmt);
 	    }
 	}
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 588ab69..8afa5fb 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6422,17 +6422,19 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
       tree decl = SSA_NAME_VAR (name);
       if (decl)
 	{
+	  gcc_assert (!SSA_NAME_IS_DEFAULT_DEF (name));
 	  replace_by_duplicate_decl (&decl, vars_map, to_context);
 	  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
 				       decl, SSA_NAME_DEF_STMT (name));
-	  if (SSA_NAME_IS_DEFAULT_DEF (name))
-	    set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
-				 decl, new_name);
 	}
       else
 	new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
 				     name, SSA_NAME_DEF_STMT (name));
 
+      /* Now that we've used the def stmt to define new_name, make sure it
+	 doesn't define name anymore.  */
+      SSA_NAME_DEF_STMT (name) = NULL;
+
       vars_map->put (name, new_name);
     }
   else
@@ -6484,6 +6486,9 @@ move_stmt_op (tree *tp, int *walk_subtrees, void *data)
     {
       if (TREE_CODE (t) == SSA_NAME)
 	*tp = replace_ssa_name (t, p->vars_map, p->to_context);
+      else if (TREE_CODE (t) == PARM_DECL
+	       && gimple_in_ssa_p (cfun))
+	*tp = *(p->vars_map->get (t));
       else if (TREE_CODE (t) == LABEL_DECL)
 	{
 	  if (p->new_label_map)
@@ -6994,6 +6999,19 @@ verify_sese (basic_block entry, basic_block exit, vec<basic_block> *bbs_p)
   BITMAP_FREE (bbs);
 }
 
+/* If FROM is an SSA_NAME, mark the version in bitmap DATA.  */
+
+bool
+gather_ssa_name_hash_map_from (tree const &from, tree const &, void *data)
+{
+  bitmap release_names = (bitmap)data;
+
+  if (TREE_CODE (from) != SSA_NAME)
+    return true;
+
+  bitmap_set_bit (release_names, SSA_NAME_VERSION (from));
+  return true;
+}
 
 /* Move a single-entry, single-exit region delimited by ENTRY_BB and
    EXIT_BB to function DEST_CFUN.  The whole region is replaced by a
@@ -7191,6 +7209,14 @@ move_sese_region_to_fn (struct function *dest_cfun, basic_block entry_bb,
   d.eh_map = eh_map;
   d.remap_decls_p = true;
 
+  if (gimple_in_ssa_p (cfun))
+    for (tree arg = DECL_ARGUMENTS (d.to_context); arg; arg = DECL_CHAIN (arg))
+      {
+	tree narg = make_ssa_name_fn (dest_cfun, arg, gimple_build_nop ());
+	set_ssa_default_def (dest_cfun, arg, narg);
+	vars_map.put (arg, narg);
+      }
+
   FOR_EACH_VEC_ELT (bbs, i, bb)
     {
       /* No need to update edge counts on the last block.  It has
@@ -7248,6 +7274,19 @@ move_sese_region_to_fn (struct function *dest_cfun, basic_block entry_bb,
   if (eh_map)
     delete eh_map;
 
+  if (gimple_in_ssa_p (cfun))
+    {
+      /* We need to release ssa-names in a defined order, so first find them,
+	 and then iterate in ascending version order.  */
+      bitmap release_names = BITMAP_ALLOC (NULL);
+      vars_map.traverse<void *, gather_ssa_name_hash_map_from> (release_names);
+      bitmap_iterator bi;
+      unsigned i;
+      EXECUTE_IF_SET_IN_BITMAP (release_names, 0, i, bi)
+	release_ssa_name (ssa_name (i));
+      BITMAP_FREE (release_names);
+    }
+
   /* Rewire the entry and exit blocks.  The successor to the entry
      block turns into the successor of DEST_FN's ENTRY_BLOCK_PTR in
      the child function.  Similarly, the predecessor of DEST_FN's
diff --git a/gcc/tree-cfg.h b/gcc/tree-cfg.h
index 6c4b1d9..4bd6fcf 100644
--- a/gcc/tree-cfg.h
+++ b/gcc/tree-cfg.h
@@ -75,6 +75,7 @@ extern bool gimple_duplicate_sese_tail (edge, edge, basic_block *, unsigned,
 extern void gather_blocks_in_sese_region (basic_block entry, basic_block exit,
 					  vec<basic_block> *bbs_p);
 extern void verify_sese (basic_block, basic_block, vec<basic_block> *);
+extern bool gather_ssa_name_hash_map_from (tree const &, tree const &, void *);
 extern basic_block move_sese_region_to_fn (struct function *, basic_block,
 				           basic_block, tree);
 extern void dump_function_to_file (tree, FILE *, int);
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg
  2015-08-11 18:53                         ` [PATCH] Don't create superfluous parm in expand_omp_taskreg Tom de Vries
@ 2015-08-12 10:51                           ` Richard Biener
  2015-09-24  6:36                           ` Thomas Schwinge
  1 sibling, 0 replies; 71+ messages in thread
From: Richard Biener @ 2015-08-12 10:51 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Tue, 11 Aug 2015, Tom de Vries wrote:

> [ was: Re: [committed, gomp4] Fix release_dangling_ssa_names ]
> 
> On 05/08/15 13:13, Richard Biener wrote:
> > On Wed, 5 Aug 2015, Tom de Vries wrote:
> > 
> > > On 05/08/15 11:30, Richard Biener wrote:
> > > > On Wed, 5 Aug 2015, Tom de Vries wrote:
> > > > 
> > > > > On 05/08/15 09:29, Richard Biener wrote:
> > > > > > > This patch fixes that by making sure we reset the def stmt to
> > > > > > > NULL.
> > > > > > > This
> > > > > > > means
> > > > > > > > we can simplify release_dangling_ssa_names to just test for NULL
> > > > > > > > def
> > > > > > > stmts.
> > > > > > Not sure if I understand the problem correctly but why are you not
> > > > > > simply
> > > > > > releasing the SSA name when you remove its definition?
> > > > > 
> > > > > In move_sese_region_to_fn we move a region of blocks from one function
> > > > > to
> > > > > another, bit by bit.
> > > > > 
> > > > > When we encounter an ssa_name as def or use in the region, we:
> > > > > - generate a new ssa_name,
> > > > > - set the def stmt of the old name as def stmt of the new name, and
> > > > > - add a mapping from the old to the new name.
> > > > > The next time we encounter the same ssa_name in another statement, we
> > > > > find
> > > > > it
> > > > > in the map.
> > > > > 
> > > > > If we release the old ssa name, we effectively create statements with
> > > > > operands
> > > > > in the free-list. The first point where that cause breakage, is in
> > > > > walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to
> > > > > be
> > > > > defined, which is not the case if it's in the free-list:
> > > > > ...
> > > > > case GIMPLE_ASSIGN:
> > > > >     /* Walk the RHS operands.  If the LHS is of a non-renamable type
> > > > > or
> > > > >        is a register variable, we may use a COMPONENT_REF on the
> > > > > RHS.*/
> > > > >     if (wi)
> > > > >       {
> > > > >         tree lhs = gimple_assign_lhs (stmt);
> > > > >         wi->val_only
> > > > >           = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg
> > > > > (lhs))
> > > > >              || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
> > > > >       }
> > > > > ...
> > > > 
> > > > Hmm, ok, probably because the stmt moving doesn't happen in DOM
> > > > order (move defs before uses).  But
> > > > 
> > > 
> > > There seems to be similar code for the rhs, so I don't think changing the
> > > order would fix anything.
> > > 
> > > > +
> > > > +      if (!SSA_NAME_IS_DEFAULT_DEF (name))
> > > > +       /* The statement has been moved to the child function.  It no
> > > > longer
> > > > +          defines name in the original function.  Mark the def stmt
> > > > NULL,
> > > > and
> > > > +          let release_dangling_ssa_names deal with it.  */
> > > > +       SSA_NAME_DEF_STMT (name) = NULL;
> > > > 
> > > > applies also to uses - I don't see why it couldn't happen that you
> > > > move a use but not its def (the def would be a parameter to the
> > > > split-out function).  You'd wreck the IL of the source function this
> > > > way.
> > > > 
> > > 
> > > If you first move a use, you create a mapping. When you encounter the def,
> > > you
> > > use the mapping. Indeed, if the def is a default def, we don't encounter
> > > the
> > > def. Which is why we create a nop as defining def for those cases. The
> > > default
> > > def in the source function still has a defining nop, and has no uses
> > > anymore.
> > > I don't understand what is broken here.
> > 
> > If you never encounter the DEF then it's broken.  Say, if for
> > 
> > foo(int a)
> > {
> >    int b = a;
> >    if (b)
> >      {
> >        < code using b >
> >      }
> > }
> > 
> > you move < code using b > to a function.  Then the def is still in
> > foo but you create a mapping for its use(s).  Clearly the outlining
> > process in this case has to pass b as parameter to the outlined
> > function, something that may not happen currently.
> > 
> 
> Ah, I see. Indeed, this is a situation that is assumed not to happen.
> 
> > It would probably be cleaner to separate the def and use remapping
> > to separate functions and record on whether we saw a def or not.
> > 
> 
> Right, or some other means to detect this situation, say when copying the def
> stmt in replace_ssa_name, check whether it's part of the sese region.
> 
> > > > I think that the whole dance of actually moving things instead of
> > > > just copying it isn't worth the extra maintainance (well, if we already
> > > > have a machinery duplicating a SESE region to another function - I
> > > > suppose gimple_duplicate_sese_region could be trivially changed to
> > > > support that).
> > > > 
> > > 
> > > I'll mention that as todo. For now, I think the fastest way to get a
> > > working
> > > version is to fix move_sese_region_to_fn.
> > 
> > Sure.
> > 
> > > > Trunk doesn't have release_dangling_ssa_names it seems
> > > 
> > > Yep, I only ran into this trouble for the kernels region handling. But I
> > > don't
> > > exclude the possibility it could happen for trunk as well.
> > > 
> > > > but I think
> > > > it belongs to move_sese_region_to_fn and not to omp-low.c
> > > 
> > > Makes sense indeed.
> > > 
> > > > and it
> > > > could also just walk the d->vars_map replace_ssa_name fills to
> > > > iterate over the removal candidates
> > > 
> > > Agreed, I suppose in general that's a win over iterating over all the ssa
> > > names.
> > > 
> > > > (and if the situation of
> > > > moving uses but not defs cannot happen you don't need any
> > > > SSA_NAME_DEF_STMT dance either).
> > > 
> > > I'd prefer to keep the SSA_NAME_DEF_STMT () = NULL bit. It makes sure a
> > > stmt
> > > is the defining stmt of only one ssa-name at all times.
> > > 
> > > I'll prepare a patch for trunk then.
> > 
> 
> This patch fixes two problems with expand_omp_taskreg:
> - it makes sure we don't generate a dummy default def in the original
>   function (which we cannot get rid of afterwards, given that it's a
>   default def).
> - it releases ssa-names in the original function that have defining
>   statements that have been moved to the split-off function.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?

Ok.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg
  2015-08-11 18:53                         ` [PATCH] Don't create superfluous parm in expand_omp_taskreg Tom de Vries
  2015-08-12 10:51                           ` Richard Biener
@ 2015-09-24  6:36                           ` Thomas Schwinge
  2015-09-24  7:21                             ` Tom de Vries
  1 sibling, 1 reply; 71+ messages in thread
From: Thomas Schwinge @ 2015-09-24  6:36 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 10218 bytes --]

Hi Tom!

On Tue, 11 Aug 2015 20:53:39 +0200, Tom de Vries <Tom_deVries@mentor.com> wrote:
> [ was: Re: [committed, gomp4] Fix release_dangling_ssa_names ]
> 
> On 05/08/15 13:13, Richard Biener wrote:
> > On Wed, 5 Aug 2015, Tom de Vries wrote:
> >
> >> On 05/08/15 11:30, Richard Biener wrote:
> >>> On Wed, 5 Aug 2015, Tom de Vries wrote:
> >>>
> >>>> On 05/08/15 09:29, Richard Biener wrote:
> >>>>>> This patch fixes that by making sure we reset the def stmt to NULL.
> >>>>>> This
> >>>>>> means
> >>>>>>> we can simplify release_dangling_ssa_names to just test for NULL def
> >>>>>> stmts.
> >>>>> Not sure if I understand the problem correctly but why are you not
> >>>>> simply
> >>>>> releasing the SSA name when you remove its definition?
> >>>>
> >>>> In move_sese_region_to_fn we move a region of blocks from one function to
> >>>> another, bit by bit.
> >>>>
> >>>> When we encounter an ssa_name as def or use in the region, we:
> >>>> - generate a new ssa_name,
> >>>> - set the def stmt of the old name as def stmt of the new name, and
> >>>> - add a mapping from the old to the new name.
> >>>> The next time we encounter the same ssa_name in another statement, we find
> >>>> it
> >>>> in the map.
> >>>>
> >>>> If we release the old ssa name, we effectively create statements with
> >>>> operands
> >>>> in the free-list. The first point where that cause breakage, is in
> >>>> walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be
> >>>> defined, which is not the case if it's in the free-list:
> >>>> ...
> >>>> case GIMPLE_ASSIGN:
> >>>>     /* Walk the RHS operands.  If the LHS is of a non-renamable type or
> >>>>        is a register variable, we may use a COMPONENT_REF on the RHS.*/
> >>>>     if (wi)
> >>>>       {
> >>>>         tree lhs = gimple_assign_lhs (stmt);
> >>>>         wi->val_only
> >>>>           = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
> >>>>              || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
> >>>>       }
> >>>> ...
> >>>
> >>> Hmm, ok, probably because the stmt moving doesn't happen in DOM
> >>> order (move defs before uses).  But
> >>>
> >>
> >> There seems to be similar code for the rhs, so I don't think changing the
> >> order would fix anything.
> >>
> >>> +
> >>> +      if (!SSA_NAME_IS_DEFAULT_DEF (name))
> >>> +       /* The statement has been moved to the child function.  It no
> >>> longer
> >>> +          defines name in the original function.  Mark the def stmt NULL,
> >>> and
> >>> +          let release_dangling_ssa_names deal with it.  */
> >>> +       SSA_NAME_DEF_STMT (name) = NULL;
> >>>
> >>> applies also to uses - I don't see why it couldn't happen that you
> >>> move a use but not its def (the def would be a parameter to the
> >>> split-out function).  You'd wreck the IL of the source function this way.
> >>>
> >>
> >> If you first move a use, you create a mapping. When you encounter the def, you
> >> use the mapping. Indeed, if the def is a default def, we don't encounter the
> >> def. Which is why we create a nop as defining def for those cases. The default
> >> def in the source function still has a defining nop, and has no uses anymore.
> >> I don't understand what is broken here.
> >
> > If you never encounter the DEF then it's broken.  Say, if for
> >
> > foo(int a)
> > {
> >    int b = a;
> >    if (b)
> >      {
> >        < code using b >
> >      }
> > }
> >
> > you move < code using b > to a function.  Then the def is still in
> > foo but you create a mapping for its use(s).  Clearly the outlining
> > process in this case has to pass b as parameter to the outlined
> > function, something that may not happen currently.
> >
> 
> Ah, I see. Indeed, this is a situation that is assumed not to happen.
> 
> > It would probably be cleaner to separate the def and use remapping
> > to separate functions and record on whether we saw a def or not.
> >
> 
> Right, or some other means to detect this situation, say when copying 
> the def stmt in replace_ssa_name, check whether it's part of the sese 
> region.
> 
> >>> I think that the whole dance of actually moving things instead of
> >>> just copying it isn't worth the extra maintainance (well, if we already
> >>> have a machinery duplicating a SESE region to another function - I
> >>> suppose gimple_duplicate_sese_region could be trivially changed to
> >>> support that).
> >>>
> >>
> >> I'll mention that as todo. For now, I think the fastest way to get a working
> >> version is to fix move_sese_region_to_fn.
> >
> > Sure.
> >
> >>> Trunk doesn't have release_dangling_ssa_names it seems
> >>
> >> Yep, I only ran into this trouble for the kernels region handling. But I don't
> >> exclude the possibility it could happen for trunk as well.
> >>
> >>> but I think
> >>> it belongs to move_sese_region_to_fn and not to omp-low.c
> >>
> >> Makes sense indeed.
> >>
> >>> and it
> >>> could also just walk the d->vars_map replace_ssa_name fills to
> >>> iterate over the removal candidates
> >>
> >> Agreed, I suppose in general that's a win over iterating over all the ssa
> >> names.
> >>
> >>> (and if the situation of
> >>> moving uses but not defs cannot happen you don't need any
> >>> SSA_NAME_DEF_STMT dance either).
> >>
> >> I'd prefer to keep the SSA_NAME_DEF_STMT () = NULL bit. It makes sure a stmt
> >> is the defining stmt of only one ssa-name at all times.
> >>
> >> I'll prepare a patch for trunk then.
> >
> 
> This patch fixes two problems with expand_omp_taskreg:
> - it makes sure we don't generate a dummy default def in the original
>    function (which we cannot get rid of afterwards, given that it's a
>    default def).
> - it releases ssa-names in the original function that have defining
>    statements that have been moved to the split-off function.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> Don't create superfluous parm in expand_omp_taskreg
> 
> 2015-08-11  Tom de Vries  <tom@codesourcery.com>
> 
> 	* omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy stmt to
> 	parm_decl, rather than generating a dummy default def in cfun.
> 	* tree-cfg.c (replace_ssa_name): Assume no default defs.  Make sure
> 	ssa_name from cfun and child_fn do not share a stmt as def stmt.
> 	(move_stmt_op): Handle PARM_DECl.
> 	(gather_ssa_name_hash_map_from): New function.
> 	(move_sese_region_to_fn): Add default defs for function params, and add
> 	them to vars_map.  Release copied ssa names.
> 	* tree-cfg.h (gather_ssa_name_hash_map_from): Declare.

Do I understand correct that with this change present on trunk (which I'm
currently merging into gomp-4_0-branch), the changes you've earlier done
on gomp-4_0-branch to gcc/omp-low.c:release_dangling_ssa_names,
gcc/tree-cfg.c:replace_ssa_name, should now be reverted?  That is, how
much of the following patches can be reverted now (listed backwards in
time)?

commit 6befb84f4c0157a4cdf66cfaf64e457180f9a7fa
Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Aug 5 06:01:08 2015 +0000

    Fix release_dangling_ssa_names
    
    2015-08-05  Tom de Vries  <tom@codesourcery.com>
    
        * omp-low.c (release_dangling_ssa_names): Release SSA_NAMEs with NULL
        def stmt.
        * tree-cfg.c (replace_ssa_name): Don't move default def nops.  Set def
        stmt of unused SSA_NAME to NULL.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226608 138bc75d-0d04-0410-961f-82ee72b054a4

commit 0cf67438bd87e5a6ec063e90da0ea20801bda54c
Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jun 4 15:47:09 2015 +0000

    Add release_dangling_ssa_names
    
    2015-06-04  Tom de Vries  <tom@codesourcery.com>
    
        * omp-low.c (release_dangling_ssa_names): Factor out of ...
        (pass_expand_omp_ssa::execute): ... here.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@224130 138bc75d-0d04-0410-961f-82ee72b054a4

commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:37:19 2015 +0000

    Expand oacc kernels after pass_fre
    
        gcc/
        * omp-low.c: Include gimple-pretty-print.h.
        (release_first_vuse_in_edge_dest): New function.
        (expand_omp_target): When not in ssa, don't split off oacc kernels
        region, clear PROP_gimple_eomp in cfun->curr_properties to force later
        expanssion, and add GOACC_kernels_internal call.
        When in ssa, split off oacc kernels and convert GOACC_kernels_internal
        into GOACC_kernels call.  Handle ssa-code.
        (pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
        properties_provided field.
        (pass_expand_omp::execute): Set PROP_gimple_eomp in
        cfun->curr_properties tentatively.
        (pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
        todo_flags_finish field.
        (pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
        execute_expand_omp.
        (gimple_stmt_ssa_operand_references_var_p)
        (gimple_stmt_omp_data_i_init_p): New function.
        * omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
        * passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
        pass_expand_omp_ssa after pass_all_early_optimizations.
        * tree-ssa-ccp.c: Include omp-low.h.
        (surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
        conservatively.
        * tree-ssa-forwprop.c: Include omp-low.h.
        (pass_forwprop::execute): Handle .omp_data_i init conservatively.
        * tree-ssa-sccvn.c: Include omp-low.h.
        (visit_use): Handle .omp_data_i init conservatively.
        * cgraph.c (cgraph_node::release_body): Don't release offloadable
        functions.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg
  2015-09-24  6:36                           ` Thomas Schwinge
@ 2015-09-24  7:21                             ` Tom de Vries
  2015-09-24  9:31                               ` Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-09-24  7:21 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek, Richard Biener

On 24/09/15 08:23, Thomas Schwinge wrote:
> Hi Tom!
>
> On Tue, 11 Aug 2015 20:53:39 +0200, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> Don't create superfluous parm in expand_omp_taskreg
>>
>> 2015-08-11  Tom de Vries  <tom@codesourcery.com>
>>
>> 	* omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy stmt to
>> 	parm_decl, rather than generating a dummy default def in cfun.
>> 	* tree-cfg.c (replace_ssa_name): Assume no default defs.  Make sure
>> 	ssa_name from cfun and child_fn do not share a stmt as def stmt.
>> 	(move_stmt_op): Handle PARM_DECl.
>> 	(gather_ssa_name_hash_map_from): New function.
>> 	(move_sese_region_to_fn): Add default defs for function params, and add
>> 	them to vars_map.  Release copied ssa names.
>> 	* tree-cfg.h (gather_ssa_name_hash_map_from): Declare.
>
> Do I understand correct that with this change present on trunk (which I'm
> currently merging into gomp-4_0-branch), the changes you've earlier done
> on gomp-4_0-branch to gcc/omp-low.c:release_dangling_ssa_names,
> gcc/tree-cfg.c:replace_ssa_name, should now be reverted?  That is, how
> much of the following patches can be reverted now (listed backwards in
> time)?
>

Hi Thomas,

indeed, in the above commit we release the dangling ssa names in 
move_sese_region_to_fn. So after committing this patch to the 
gomp-4_0-branch, the call to release_dangling_ssa_names is no longer 
necessary, and the function release_dangling_ssa_names can be removed.

Thanks,
- Tom

> commit 6befb84f4c0157a4cdf66cfaf64e457180f9a7fa
> Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Wed Aug 5 06:01:08 2015 +0000
>
>      Fix release_dangling_ssa_names
>
>      2015-08-05  Tom de Vries  <tom@codesourcery.com>
>
>          * omp-low.c (release_dangling_ssa_names): Release SSA_NAMEs with NULL
>          def stmt.
>          * tree-cfg.c (replace_ssa_name): Don't move default def nops.  Set def
>          stmt of unused SSA_NAME to NULL.
>
>      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226608 138bc75d-0d04-0410-961f-82ee72b054a4
>
> commit 0cf67438bd87e5a6ec063e90da0ea20801bda54c
> Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Thu Jun 4 15:47:09 2015 +0000
>
>      Add release_dangling_ssa_names
>
>      2015-06-04  Tom de Vries  <tom@codesourcery.com>
>
>          * omp-low.c (release_dangling_ssa_names): Factor out of ...
>          (pass_expand_omp_ssa::execute): ... here.
>
>      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@224130 138bc75d-0d04-0410-961f-82ee72b054a4
>
> commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Apr 21 19:37:19 2015 +0000
>
>      Expand oacc kernels after pass_fre
>
>          gcc/
>          * omp-low.c: Include gimple-pretty-print.h.
>          (release_first_vuse_in_edge_dest): New function.
>          (expand_omp_target): When not in ssa, don't split off oacc kernels
>          region, clear PROP_gimple_eomp in cfun->curr_properties to force later
>          expanssion, and add GOACC_kernels_internal call.
>          When in ssa, split off oacc kernels and convert GOACC_kernels_internal
>          into GOACC_kernels call.  Handle ssa-code.
>          (pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
>          properties_provided field.
>          (pass_expand_omp::execute): Set PROP_gimple_eomp in
>          cfun->curr_properties tentatively.
>          (pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
>          todo_flags_finish field.
>          (pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
>          execute_expand_omp.
>          (gimple_stmt_ssa_operand_references_var_p)
>          (gimple_stmt_omp_data_i_init_p): New function.
>          * omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
>          * passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
>          pass_expand_omp_ssa after pass_all_early_optimizations.
>          * tree-ssa-ccp.c: Include omp-low.h.
>          (surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
>          conservatively.
>          * tree-ssa-forwprop.c: Include omp-low.h.
>          (pass_forwprop::execute): Handle .omp_data_i init conservatively.
>          * tree-ssa-sccvn.c: Include omp-low.h.
>          (visit_use): Handle .omp_data_i init conservatively.
>          * cgraph.c (cgraph_node::release_body): Don't release offloadable
>          functions.
>
>      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4
>
>
> Grüße,
>   Thomas
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg
  2015-09-24  7:21                             ` Tom de Vries
@ 2015-09-24  9:31                               ` Thomas Schwinge
  2015-09-30  8:05                                 ` [gomp4,committed] Remove release_dangling_ssa_names Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Thomas Schwinge @ 2015-09-24  9:31 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 11619 bytes --]

Hi Tom!

On Thu, 24 Sep 2015 08:36:27 +0200, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 24/09/15 08:23, Thomas Schwinge wrote:
> > On Tue, 11 Aug 2015 20:53:39 +0200, Tom de Vries <Tom_deVries@mentor.com> wrote:
> >> Don't create superfluous parm in expand_omp_taskreg
> >>
> >> 2015-08-11  Tom de Vries  <tom@codesourcery.com>
> >>
> >> 	* omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy stmt to
> >> 	parm_decl, rather than generating a dummy default def in cfun.
> >> 	* tree-cfg.c (replace_ssa_name): Assume no default defs.  Make sure
> >> 	ssa_name from cfun and child_fn do not share a stmt as def stmt.
> >> 	(move_stmt_op): Handle PARM_DECl.
> >> 	(gather_ssa_name_hash_map_from): New function.
> >> 	(move_sese_region_to_fn): Add default defs for function params, and add
> >> 	them to vars_map.  Release copied ssa names.
> >> 	* tree-cfg.h (gather_ssa_name_hash_map_from): Declare.
> >
> > Do I understand correct that with this change present on trunk (which I'm
> > currently merging into gomp-4_0-branch), the changes you've earlier done
> > on gomp-4_0-branch to gcc/omp-low.c:release_dangling_ssa_names,
> > gcc/tree-cfg.c:replace_ssa_name, should now be reverted?  That is, how
> > much of the following patches can be reverted now (listed backwards in
> > time)?
> 
> indeed, in the above commit we release the dangling ssa names in 
> move_sese_region_to_fn. So after committing this patch to the 
> gomp-4_0-branch, the call to release_dangling_ssa_names is no longer 
> necessary, and the function release_dangling_ssa_names can be removed.

From IRC:

    <tschwinge> vries: Are you totally busy right now, or could you spend
      an hour on backporting to gomp-4_0-branch your trunk commit that I
      mentorioned earlier today?
    <vries> tschwinge: shouldn't be a problem
    <tschwinge> veWell, I'm asking because in my merge tree, I'm running
      into an assertion that you added there -- not sure yet whether I've
      done something wrong, though.
    <tschwinge> vries: ^
    <vries> tschwinge: it would be useful for me to know which assertion

So far, I only looked at libgomp test results, and there, none of the
OpenMP but quite a number of the OpenACC tests fail as follows, for
example libgomp.oacc-c-c++-common/kernels-1.c:

    [...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ [...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/kernels-1.c -B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/ -B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs -I[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp -I[...]/source-gcc/libgomp/testsuite/../../include -I[...]/source-gcc/libgomp/testsuite/.. -I/usr/local/cuda-5.5/targets/x86_64-linux/include -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -B[...]/install/offload-nvptx-none/libexec/gcc/x86_64-pc-linux-gnu/6.0.0 -B[...]/install/offload-nvptx-none/bin -B[...]/install/offload-x86_64-intelmicemul-linux-gnu/libexec/gcc/x86_64-pc-linux-gnu/6.0.0 -B[...]/install/offload-x86_64-intelmicemul-linux-gnu/bin -fopenacc -I[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 -L[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs -lm -o ./kernels-1.exe
    In file included from [...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/kernels-1.c:6:0:
    [...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses.h: In function 'main':
    [...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-clauses.h:182:9: internal compiler error: in replace_ssa_name, at tree-cfg.c:6423
    0xb0518a replace_ssa_name
            [...]/source-gcc/gcc/tree-cfg.c:6423
    0xb05407 move_stmt_op
            [...]/source-gcc/gcc/tree-cfg.c:6501
    0xd75e43 walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, hash_set<tree_node*, default_hash_traits<tree_node*> >*, tree_node* (*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, hash_set<tree_node*, default_hash_traits<tree_node*> >*))
            [...]/source-gcc/gcc/tree.c:11341
    0x89832c walk_gimple_op(gimple*, tree_node* (*)(tree_node**, int*, void*), walk_stmt_info*)
            [...]/source-gcc/gcc/gimple-walk.c:204
    0x898814 walk_gimple_stmt(gimple_stmt_iterator*, tree_node* (*)(gimple_stmt_iterator*, bool*, walk_stmt_info*), tree_node* (*)(tree_node**, int*, void*), walk_stmt_info*)
            [...]/source-gcc/gcc/gimple-walk.c:562
    0xb088e3 move_block_to_fn
            [...]/source-gcc/gcc/tree-cfg.c:6774
    0xb088e3 move_sese_region_to_fn(function*, basic_block_def*, basic_block_def*, tree_node*)
            [...]/source-gcc/gcc/tree-cfg.c:7238
    0x9c1133 expand_omp_target
            [...]/source-gcc/gcc/omp-low.c:9802
    0x9c3a1c expand_omp
            [...]/source-gcc/gcc/omp-low.c:10240
    0x9cc3fe execute_expand_omp
            [...]/source-gcc/gcc/omp-low.c:10486
    0x9cc568 execute
            [...]/source-gcc/gcc/omp-low.c:10609

source-gcc/gcc/tree-cfg.c:

    [...]
      6405  /* Creates an ssa name in TO_CONTEXT equivalent to NAME.
      6406     VARS_MAP maps old ssa names and var_decls to the new ones.  */
      6407  
      6408  static tree
      6409  replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
      6410                    tree to_context)
      6411  {
      6412    tree new_name;
      6413  
      6414    gcc_assert (!virtual_operand_p (name));
      6415  
      6416    tree *loc = vars_map->get (name);
      6417  
      6418    if (!loc)
      6419      {
      6420        tree decl = SSA_NAME_VAR (name);
      6421        if (decl)
      6422          {
      6423            gcc_assert (!SSA_NAME_IS_DEFAULT_DEF (name));
      6424            replace_by_duplicate_decl (&decl, vars_map, to_context);
      6425            /* If name is a default def, then we don't move the defining stmt
      6426               (which is a nop).  Because (1) the nop doesn't belong to the sese
      6427               region, and (2) while setting the def stmt of name to NULL would
      6428               trigger release_ssa_name in release_dangling_ssa_names, it wouldn't
      6429               be released since it's a default def, and subsequently cause an
      6430               ssa verification failure.  */
      6431            new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
      6432                                         decl, SSA_NAME_DEF_STMT (name));
      6433            if (SSA_NAME_IS_DEFAULT_DEF (name))
      6434              set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
      6435                                   decl, new_name);
      6436          }
      6437        else
      6438          new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
      6439                                       name, SSA_NAME_DEF_STMT (name));
      6440  
      6441        /* Now that we've used the def stmt to define new_name, make sure it
      6442           doesn't define name anymore.  */
      6443        SSA_NAME_DEF_STMT (name) = NULL;
      6444  
      6445        vars_map->put (name, new_name);
      6446  
      6447        if (!SSA_NAME_IS_DEFAULT_DEF (name))
      6448          /* The statement has been moved to the child function.  It no longer
      6449             defines name in the original function.  Mark the def stmt NULL, and
      6450             let release_dangling_ssa_names deal with it.  */
      6451          SSA_NAME_DEF_STMT (name) = NULL;
      6452      }
      6453    else
      6454      new_name = *loc;
      6455  
      6456    return new_name;
      6457  }
    [...]

As I said, this is in my WIP tree to merge from trunk into
gomp-4_0-branch.  To rule out any obvious things, would you please
backport to gomp-4_0-branch your trunk commit r227103
(883f001d2c3672e0674bec71f36a2052734a72cf, "Don't create superfluous parm
in expand_omp_taskreg"), and revert the following changes as appropriate:

> > commit 6befb84f4c0157a4cdf66cfaf64e457180f9a7fa
> > Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
> > Date:   Wed Aug 5 06:01:08 2015 +0000
> >
> >      Fix release_dangling_ssa_names
> >
> >      2015-08-05  Tom de Vries  <tom@codesourcery.com>
> >
> >          * omp-low.c (release_dangling_ssa_names): Release SSA_NAMEs with NULL
> >          def stmt.
> >          * tree-cfg.c (replace_ssa_name): Don't move default def nops.  Set def
> >          stmt of unused SSA_NAME to NULL.
> >
> >      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226608 138bc75d-0d04-0410-961f-82ee72b054a4
> >
> > commit 0cf67438bd87e5a6ec063e90da0ea20801bda54c
> > Author: vries <vries@138bc75d-0d04-0410-961f-82ee72b054a4>
> > Date:   Thu Jun 4 15:47:09 2015 +0000
> >
> >      Add release_dangling_ssa_names
> >
> >      2015-06-04  Tom de Vries  <tom@codesourcery.com>
> >
> >          * omp-low.c (release_dangling_ssa_names): Factor out of ...
> >          (pass_expand_omp_ssa::execute): ... here.
> >
> >      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@224130 138bc75d-0d04-0410-961f-82ee72b054a4
> >
> > commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
> > Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> > Date:   Tue Apr 21 19:37:19 2015 +0000
> >
> >      Expand oacc kernels after pass_fre
> >
> >          gcc/
> >          * omp-low.c: Include gimple-pretty-print.h.
> >          (release_first_vuse_in_edge_dest): New function.
> >          (expand_omp_target): When not in ssa, don't split off oacc kernels
> >          region, clear PROP_gimple_eomp in cfun->curr_properties to force later
> >          expanssion, and add GOACC_kernels_internal call.
> >          When in ssa, split off oacc kernels and convert GOACC_kernels_internal
> >          into GOACC_kernels call.  Handle ssa-code.
> >          (pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
> >          properties_provided field.
> >          (pass_expand_omp::execute): Set PROP_gimple_eomp in
> >          cfun->curr_properties tentatively.
> >          (pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
> >          todo_flags_finish field.
> >          (pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
> >          execute_expand_omp.
> >          (gimple_stmt_ssa_operand_references_var_p)
> >          (gimple_stmt_omp_data_i_init_p): New function.
> >          * omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
> >          * passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
> >          pass_expand_omp_ssa after pass_all_early_optimizations.
> >          * tree-ssa-ccp.c: Include omp-low.h.
> >          (surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
> >          conservatively.
> >          * tree-ssa-forwprop.c: Include omp-low.h.
> >          (pass_forwprop::execute): Handle .omp_data_i init conservatively.
> >          * tree-ssa-sccvn.c: Include omp-low.h.
> >          (visit_use): Handle .omp_data_i init conservatively.
> >          * cgraph.c (cgraph_node::release_body): Don't release offloadable
> >          functions.
> >
> >      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [gomp4,committed] Remove release_dangling_ssa_names
  2015-09-24  9:31                               ` Thomas Schwinge
@ 2015-09-30  8:05                                 ` Tom de Vries
  2015-09-30 10:05                                   ` Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-09-30  8:05 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2614 bytes --]

[ was: Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg ]
On 24/09/15 11:02, Thomas Schwinge wrote:
> Hi Tom!
>
> On Thu, 24 Sep 2015 08:36:27 +0200, Tom de Vries<Tom_deVries@mentor.com>  wrote:
>> >On 24/09/15 08:23, Thomas Schwinge wrote:
>>> > >On Tue, 11 Aug 2015 20:53:39 +0200, Tom de Vries<Tom_deVries@mentor.com>  wrote:
>>>> > >>Don't create superfluous parm in expand_omp_taskreg
>>>> > >>
>>>> > >>2015-08-11  Tom de Vries<tom@codesourcery.com>
>>>> > >>
>>>> > >>	* omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy stmt to
>>>> > >>	parm_decl, rather than generating a dummy default def in cfun.
>>>> > >>	* tree-cfg.c (replace_ssa_name): Assume no default defs.  Make sure
>>>> > >>	ssa_name from cfun and child_fn do not share a stmt as def stmt.
>>>> > >>	(move_stmt_op): Handle PARM_DECl.
>>>> > >>	(gather_ssa_name_hash_map_from): New function.
>>>> > >>	(move_sese_region_to_fn): Add default defs for function params, and add
>>>> > >>	them to vars_map.  Release copied ssa names.
>>>> > >>	* tree-cfg.h (gather_ssa_name_hash_map_from): Declare.
>>> > >
>>> > >Do I understand correct that with this change present on trunk (which I'm
>>> > >currently merging into gomp-4_0-branch), the changes you've earlier done
>>> > >on gomp-4_0-branch to gcc/omp-low.c:release_dangling_ssa_names,
>>> > >gcc/tree-cfg.c:replace_ssa_name, should now be reverted?  That is, how
>>> > >much of the following patches can be reverted now (listed backwards in
>>> > >time)?
>> >
>> >indeed, in the above commit we release the dangling ssa names in
>> >move_sese_region_to_fn. So after committing this patch to the
>> >gomp-4_0-branch, the call to release_dangling_ssa_names is no longer
>> >necessary, and the function release_dangling_ssa_names can be removed.

<SNIP>

>      <tschwinge> Well, I'm asking because in my merge tree, I'm running
>        into an assertion that you added there -- not sure yet whether I've
>        done something wrong, though.

The source of the problem was in expand_omp_target, which needed similar 
changes as expand_omp_taskreg got in the "Don't create superfluous parm 
in expand_omp_taskreg" patch.

Now that the merge ( 
https://gcc.gnu.org/viewcvs/gcc/branches/gomp-4_0-branch/gcc/omp-low.c?limit_changes=0&r1=228091&r2=228090&pathrev=228091 
) contains that change, I've committed these two patches to gomp-4_0-branch:
- Revert "Fix release_dangling_ssa_names"
   (Reverting an earlier attempt to handle the
   release_dangling_ssa_names TODO, which was committed to the
   gomp-4_0-branch)
- Remove release_dangling_ssa_names

Thanks,
- Tom


[-- Attachment #2: 0001-Revert-Fix-release_dangling_ssa_names.patch --]
[-- Type: text/x-patch, Size: 3457 bytes --]

Revert "Fix release_dangling_ssa_names"

2015-09-24  Tom de Vries  <tom@codesourcery.com>

	Revert:
	2015-08-05  Tom de Vries  <tom@codesourcery.com>

	* omp-low.c (release_dangling_ssa_names): Release SSA_NAMEs with NULL
	def stmt.
	* tree-cfg.c (replace_ssa_name): Don't move default def nops.  Set def
	stmt of unused SSA_NAME to NULL.
---
 gcc/omp-low.c  | 33 ++++++++++++++++++++++++---------
 gcc/tree-cfg.c | 12 ------------
 2 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a72db53..04a60ab 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10541,10 +10541,11 @@ make_pass_expand_omp (gcc::context *ctxt)
   return new pass_expand_omp (ctxt);
 }
 
-/* After running pass_expand_omp_ssa to expand the oacc kernels directive, we
-   are left in the original function with anonymous SSA_NAMEs, with a NULL
-   defining statement.  This function finds those SSA_NAMEs and releases
-   them.  */
+/* After running pass_expand_omp_ssa to expand the oacc kernels
+   directive, we are left in the original function with anonymous
+   SSA_NAMEs, with a defining statement that has been deleted.  This
+   pass finds those SSA_NAMEs and releases them.
+   TODO: Either fix this elsewhere, or make the fix unnecessary.  */
 
 static void
 release_dangling_ssa_names (void)
@@ -10559,12 +10560,26 @@ release_dangling_ssa_names (void)
       gimple *stmt = SSA_NAME_DEF_STMT (name);
       if (stmt != NULL)
 	continue;
+      bool found = false;
 
-      release_ssa_name (name);
-      gcc_assert (SSA_NAME_IN_FREE_LIST (name));
-      if (dump_file
-	  && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Released dangling ssa name %u\n", i);
+      ssa_op_iter op_iter;
+      def_operand_p def_p;
+      FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	{
+	  tree def = DEF_FROM_PTR (def_p);
+	  if (def == name)
+	    {
+	      found = true;
+	      break;
+	    }
+	}
+
+      if (!found)
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "Released dangling ssa name %u\n", i);
+	  release_ssa_name (name);
+	}
     }
 }
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index cd7a4b4..a3c3b20 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6422,12 +6422,6 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
 	{
 	  gcc_assert (!SSA_NAME_IS_DEFAULT_DEF (name));
 	  replace_by_duplicate_decl (&decl, vars_map, to_context);
-	  /* If name is a default def, then we don't move the defining stmt
-	     (which is a nop).  Because (1) the nop doesn't belong to the sese
-	     region, and (2) while setting the def stmt of name to NULL would
-	     trigger release_ssa_name in release_dangling_ssa_names, it wouldn't
-	     be released since it's a default def, and subsequently cause an
-	     ssa verification failure.  */
 	  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
 				       decl, SSA_NAME_DEF_STMT (name));
 	  if (SSA_NAME_IS_DEFAULT_DEF (name))
@@ -6443,12 +6437,6 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
       SSA_NAME_DEF_STMT (name) = NULL;
 
       vars_map->put (name, new_name);
-
-      if (!SSA_NAME_IS_DEFAULT_DEF (name))
-	/* The statement has been moved to the child function.  It no longer
-	   defines name in the original function.  Mark the def stmt NULL, and
-	   let release_dangling_ssa_names deal with it.  */
-	SSA_NAME_DEF_STMT (name) = NULL;
     }
   else
     new_name = *loc;
-- 
1.9.1


[-- Attachment #3: 0002-Remove-release_dangling_ssa_names.patch --]
[-- Type: text/x-patch, Size: 1966 bytes --]

Remove release_dangling_ssa_names

2015-09-24  Tom de Vries  <tom@codesourcery.com>

	* omp-low.c (release_dangling_ssa_names): Remove.
	(pass_omp_expand_ssa::execute): Remove call to
	release_dangling_ssa_names.
---
 gcc/omp-low.c | 46 +---------------------------------------------
 1 file changed, 1 insertion(+), 45 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 04a60ab..6bdfaa2 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10541,48 +10541,6 @@ make_pass_expand_omp (gcc::context *ctxt)
   return new pass_expand_omp (ctxt);
 }
 
-/* After running pass_expand_omp_ssa to expand the oacc kernels
-   directive, we are left in the original function with anonymous
-   SSA_NAMEs, with a defining statement that has been deleted.  This
-   pass finds those SSA_NAMEs and releases them.
-   TODO: Either fix this elsewhere, or make the fix unnecessary.  */
-
-static void
-release_dangling_ssa_names (void)
-{
-  unsigned int i;
-  for (i = 1; i < num_ssa_names; ++i)
-    {
-      tree name = ssa_name (i);
-      if (name == NULL_TREE)
-	continue;
-
-      gimple *stmt = SSA_NAME_DEF_STMT (name);
-      if (stmt != NULL)
-	continue;
-      bool found = false;
-
-      ssa_op_iter op_iter;
-      def_operand_p def_p;
-      FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
-	{
-	  tree def = DEF_FROM_PTR (def_p);
-	  if (def == name)
-	    {
-	      found = true;
-	      break;
-	    }
-	}
-
-      if (!found)
-	{
-	  if (dump_file)
-	    fprintf (dump_file, "Released dangling ssa name %u\n", i);
-	  release_ssa_name (name);
-	}
-    }
-}
-
 namespace {
 
 const pass_data pass_data_expand_omp_ssa =
@@ -10613,9 +10571,7 @@ public:
     }
   virtual unsigned int execute (function *)
     {
-      unsigned res = execute_expand_omp ();
-      release_dangling_ssa_names ();
-      return res;
+      return execute_expand_omp ();
     }
   opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [gomp4,committed] Remove release_dangling_ssa_names
  2015-09-30  8:05                                 ` [gomp4,committed] Remove release_dangling_ssa_names Tom de Vries
@ 2015-09-30 10:05                                   ` Thomas Schwinge
  2015-09-30 10:25                                     ` Tom de Vries
  0 siblings, 1 reply; 71+ messages in thread
From: Thomas Schwinge @ 2015-09-30 10:05 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 3777 bytes --]

Hi Tom!

On Wed, 30 Sep 2015 08:17:04 +0200, Tom de Vries <Tom_deVries@mentor.com> wrote:
> [ was: Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg ]
> On 24/09/15 11:02, Thomas Schwinge wrote:
> > On Thu, 24 Sep 2015 08:36:27 +0200, Tom de Vries<Tom_deVries@mentor.com>  wrote:
> >> >On 24/09/15 08:23, Thomas Schwinge wrote:
> >>> > >On Tue, 11 Aug 2015 20:53:39 +0200, Tom de Vries<Tom_deVries@mentor.com>  wrote:
> >>>> > >>Don't create superfluous parm in expand_omp_taskreg
> >>>> > >>
> >>>> > >>2015-08-11  Tom de Vries<tom@codesourcery.com>
> >>>> > >>
> >>>> > >>	* omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy stmt to
> >>>> > >>	parm_decl, rather than generating a dummy default def in cfun.
> >>>> > >>	* tree-cfg.c (replace_ssa_name): Assume no default defs.  Make sure
> >>>> > >>	ssa_name from cfun and child_fn do not share a stmt as def stmt.
> >>>> > >>	(move_stmt_op): Handle PARM_DECl.
> >>>> > >>	(gather_ssa_name_hash_map_from): New function.
> >>>> > >>	(move_sese_region_to_fn): Add default defs for function params, and add
> >>>> > >>	them to vars_map.  Release copied ssa names.
> >>>> > >>	* tree-cfg.h (gather_ssa_name_hash_map_from): Declare.
> >>> > >
> >>> > >Do I understand correct that with this change present on trunk (which I'm
> >>> > >currently merging into gomp-4_0-branch), the changes you've earlier done
> >>> > >on gomp-4_0-branch to gcc/omp-low.c:release_dangling_ssa_names,
> >>> > >gcc/tree-cfg.c:replace_ssa_name, should now be reverted?  That is, how
> >>> > >much of the following patches can be reverted now (listed backwards in
> >>> > >time)?
> >> >
> >> >indeed, in the above commit we release the dangling ssa names in
> >> >move_sese_region_to_fn. So after committing this patch to the
> >> >gomp-4_0-branch, the call to release_dangling_ssa_names is no longer
> >> >necessary, and the function release_dangling_ssa_names can be removed.
> 
> <SNIP>
> 
> >      <tschwinge> Well, I'm asking because in my merge tree, I'm running
> >        into an assertion that you added there -- not sure yet whether I've
> >        done something wrong, though.
> 
> The source of the problem was

Thanks for quickly having provided me with a patch!

> in expand_omp_target, which needed similar 
> changes as expand_omp_taskreg got in the "Don't create superfluous parm 
> in expand_omp_taskreg" patch.

(For the curious, such a patch is not yet needed on trunk, where
expand_omp_target does not yet need to support the "gimple_in_ssa_p"
case.)

> Now that the merge ( 
> https://gcc.gnu.org/viewcvs/gcc/branches/gomp-4_0-branch/gcc/omp-low.c?limit_changes=0&r1=228091&r2=228090&pathrev=228091 
> ) contains that change, I've committed these two patches to gomp-4_0-branch:
> - Revert "Fix release_dangling_ssa_names"
>    (Reverting an earlier attempt to handle the
>    release_dangling_ssa_names TODO, which was committed to the
>    gomp-4_0-branch)
> - Remove release_dangling_ssa_names

Don't we also want to commit the following change, which was part of your
trunk r227103 (883f001d2c3672e0674bec71f36a2052734a72cf) commit (and now
shows up as a delta between trunk and gomp-4_0-branch)?

--- gcc/tree-cfg.c
+++ gcc/tree-cfg.c
@@ -6424,9 +6424,6 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
 	  replace_by_duplicate_decl (&decl, vars_map, to_context);
 	  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
 				       decl, SSA_NAME_DEF_STMT (name));
-	  if (SSA_NAME_IS_DEFAULT_DEF (name))
-	    set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
-				 decl, new_name);
 	}
       else
 	new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [gomp4,committed] Remove release_dangling_ssa_names
  2015-09-30 10:05                                   ` Thomas Schwinge
@ 2015-09-30 10:25                                     ` Tom de Vries
  2015-09-30 10:43                                       ` Thomas Schwinge
  0 siblings, 1 reply; 71+ messages in thread
From: Tom de Vries @ 2015-09-30 10:25 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek, Richard Biener

On 30/09/15 11:25, Thomas Schwinge wrote:
> Don't we also want to commit the following change, which was part of your
> trunk r227103 (883f001d2c3672e0674bec71f36a2052734a72cf) commit (and now
> shows up as a delta between trunk and gomp-4_0-branch)?
>
> --- gcc/tree-cfg.c
> +++ gcc/tree-cfg.c
> @@ -6424,9 +6424,6 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
>   	  replace_by_duplicate_decl (&decl, vars_map, to_context);
>   	  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
>   				       decl, SSA_NAME_DEF_STMT (name));
> -	  if (SSA_NAME_IS_DEFAULT_DEF (name))
> -	    set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
> -				 decl, new_name);
>   	}
>         else
>   	new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),

Indeed, that bit is part of the patch "Don't create superfluous parm in 
expand_omp_taskreg", but was dropped in the merge (probably because it 
conflicted with the "Fix release_dangling_ssa_names" patch that I just 
reverted).

So we need to apply it.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [gomp4,committed] Remove release_dangling_ssa_names
  2015-09-30 10:25                                     ` Tom de Vries
@ 2015-09-30 10:43                                       ` Thomas Schwinge
  0 siblings, 0 replies; 71+ messages in thread
From: Thomas Schwinge @ 2015-09-30 10:43 UTC (permalink / raw)
  To: Tom de Vries; +Cc: GCC Patches, Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2800 bytes --]

Hi Tom!

On Wed, 30 Sep 2015 11:46:56 +0200, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 30/09/15 11:25, Thomas Schwinge wrote:
> > Don't we also want to commit the following change, which was part of your
> > trunk r227103 (883f001d2c3672e0674bec71f36a2052734a72cf) commit (and now
> > shows up as a delta between trunk and gomp-4_0-branch)?
> >
> > --- gcc/tree-cfg.c
> > +++ gcc/tree-cfg.c
> > @@ -6424,9 +6424,6 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
> >   	  replace_by_duplicate_decl (&decl, vars_map, to_context);
> >   	  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
> >   				       decl, SSA_NAME_DEF_STMT (name));
> > -	  if (SSA_NAME_IS_DEFAULT_DEF (name))
> > -	    set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
> > -				 decl, new_name);
> >   	}
> >         else
> >   	new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
> 
> Indeed, that bit is part of the patch "Don't create superfluous parm in 
> expand_omp_taskreg", but was dropped in the merge (probably because it 
> conflicted with the "Fix release_dangling_ssa_names" patch that I just 
> reverted).

Aha, so my fault after all.  ;-)

> So we need to apply it.

Committed to gomp-4_0-branch in r228285:

commit 4d7d168dc9aab3fa1ebf55bb3cb94d7b0477d639
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Sep 30 10:03:37 2015 +0000

    More gcc/tree-cfg.c:replace_ssa_name cleanup
    
    	gcc/
    	* tree-cfg.c (replace_ssa_name): Revert obsolete changes.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228285 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp | 4 ++++
 gcc/tree-cfg.c     | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 06440bf..c4033e0 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2015-09-30  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* tree-cfg.c (replace_ssa_name): Revert obsolete changes.
+
 2015-09-30  Tom de Vries  <tom@codesourcery.com>
 
 	* omp-low.c (release_dangling_ssa_names): Remove.
diff --git gcc/tree-cfg.c gcc/tree-cfg.c
index a3c3b20..500fc43 100644
--- gcc/tree-cfg.c
+++ gcc/tree-cfg.c
@@ -6424,9 +6424,6 @@ replace_ssa_name (tree name, hash_map<tree, tree> *vars_map,
 	  replace_by_duplicate_decl (&decl, vars_map, to_context);
 	  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
 				       decl, SSA_NAME_DEF_STMT (name));
-	  if (SSA_NAME_IS_DEFAULT_DEF (name))
-	    set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
-				 decl, new_name);
 	}
       else
 	new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2015-09-30 10:05 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
2014-11-15 17:21 ` [PATCH, 1/8] Expand oacc kernels after pass_build_ealias Tom de Vries
2014-11-24 11:29   ` Tom de Vries
2014-11-25 11:30     ` Tom de Vries
2015-04-21 19:40       ` Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias) Thomas Schwinge
2015-04-22  7:36         ` Richard Biener
2015-06-04 16:50           ` Expand oacc kernels after pass_fre Tom de Vries
2015-06-08  7:29             ` Richard Biener
2015-06-19  9:04               ` Tom de Vries
2015-08-05  7:24             ` [committed, gomp4] Fix release_dangling_ssa_names Tom de Vries
2015-08-05  7:29               ` Richard Biener
2015-08-05  8:48                 ` Tom de Vries
2015-08-05  9:30                   ` Richard Biener
2015-08-05 10:49                     ` Tom de Vries
2015-08-05 11:13                       ` Richard Biener
2015-08-11  9:25                         ` [committed] Add todo comment for move_sese_region_to_fn Tom de Vries
2015-08-11 18:53                         ` [PATCH] Don't create superfluous parm in expand_omp_taskreg Tom de Vries
2015-08-12 10:51                           ` Richard Biener
2015-09-24  6:36                           ` Thomas Schwinge
2015-09-24  7:21                             ` Tom de Vries
2015-09-24  9:31                               ` Thomas Schwinge
2015-09-30  8:05                                 ` [gomp4,committed] Remove release_dangling_ssa_names Tom de Vries
2015-09-30 10:05                                   ` Thomas Schwinge
2015-09-30 10:25                                     ` Tom de Vries
2015-09-30 10:43                                       ` Thomas Schwinge
2014-11-15 17:22 ` [PATCH, 2/8] Add pass_oacc_kernels Tom de Vries
2014-11-25 11:31   ` Tom de Vries
2015-04-21 19:46     ` Thomas Schwinge
2014-11-15 17:23 ` [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels Tom de Vries
2014-11-25 11:39   ` Tom de Vries
2015-04-21 19:49     ` Thomas Schwinge
2015-04-22  7:39       ` Richard Biener
2015-06-03  9:22         ` Tom de Vries
2015-06-03 11:21           ` Richard Biener
2015-06-04 15:59             ` Tom de Vries
2015-06-03 10:05         ` Tom de Vries
2015-06-03 11:22           ` Richard Biener
2014-11-15 17:23 ` [PATCH, 4/8] Add pass_tree_loop_{init,done} " Tom de Vries
2014-11-25 11:42   ` Tom de Vries
2015-04-21 19:52     ` Thomas Schwinge
2015-04-22  7:40       ` Richard Biener
2015-06-02 13:52         ` Tom de Vries
2015-06-02 13:58           ` Richard Biener
2015-06-02 15:40             ` Tom de Vries
2015-06-03 11:26               ` Richard Biener
2014-11-15 17:24 ` [PATCH, 5/8] Add pass_loop_im " Tom de Vries
2014-11-25 12:00   ` Tom de Vries
2015-04-21 19:57     ` [PATCH, 5/8] Add pass_lim " Thomas Schwinge
2014-11-15 18:32 ` [PATCH, 6/8] Add pass_ccp " Tom de Vries
2014-11-25 12:03   ` Tom de Vries
2015-04-21 20:01     ` [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels Thomas Schwinge
2015-04-22  7:42       ` Richard Biener
2015-06-02 13:04         ` Tom de Vries
2014-11-15 18:52 ` [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels Tom de Vries
2014-11-25 12:15   ` Tom de Vries
2015-04-21 20:09     ` [PATCH, 7/8] Add pass_parallelize_loops_oacc_kernels " Thomas Schwinge
2014-11-15 19:04 ` [PATCH, 8/8] Do simple omp lowering for no address taken var Tom de Vries
2014-11-17 10:29   ` Richard Biener
2014-11-18  9:13     ` Eric Botcazou
2014-11-18  9:53       ` Richard Biener
2014-11-18 12:20         ` Richard Biener
2014-11-24 11:53     ` Tom de Vries
2014-11-24 11:55       ` Tom de Vries
2014-11-24 12:42         ` Richard Biener
2014-11-24 18:49           ` Tom de Vries
2014-11-24 12:40       ` Richard Biener
2014-11-19 20:34 ` openacc kernels directive -- initial support Tom de Vries
2015-04-21 19:27 ` Add BUILT_IN_GOACC_KERNELS_INTERNAL (was: openacc kernels directive -- initial support) Thomas Schwinge
2015-04-21 20:24 ` Handle global loop counters in fortran oacc kernels " Thomas Schwinge
2015-04-21 20:29 ` Handle global loop counters in c/c++ " Thomas Schwinge
2015-04-21 20:33 ` Handle oacc kernels with other directives " Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).