public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch,gomp-4_0-branch] openacc parallel reduction part 1
@ 2014-07-06 23:11 Cesar Philippidis
  2014-07-07  9:55 ` Thomas Schwinge
  2014-07-28 17:06 ` Thomas Schwinge
  0 siblings, 2 replies; 12+ messages in thread
From: Cesar Philippidis @ 2014-07-06 23:11 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]

This patch is the first step to enabling parallel reductions in openacc.
The approach I'm using here is a bit different the to the existing
method in openmp. For clarity, consider the following example:

for (i = 0, i < n; i++)
  sum += 1;

Currently, gcc breaks up the for loop into m partitions, one per thread.
Each thread has it's own local sum, say sum_0. So the transformed loop
becomes:

sum_0 = 0;

for (i = lower; i < upper; i++)
  sum_0 += 1;

where, lower and upper are the loop boundaries for the current thread.

After the intermediate sums are finished, the openmp reduction uses an
atomic add to add sum_0 to sum. However, that's not very efficient on
massively parallel accelerators. Among other reasons, this sum is
sequential and there a lot of lock contention writing to the final sum
variable.

For accelerators using openacc, the new reduction stores the
intermediate values in an array. Once the loop nest has completed, a
parallel sum (or other operation) can be used to speedup that portion of
the reduction.

As mentioned earlier, this patch isn't complete yet. For starters, parts
of it depends on our internal ptx backend. I've temporarily remapped the
ptx dependencies to their openmp equivalent, but without a proper
openacc runtime this infrastructure won't do much. It also does not
preform the final reduction in parallel just yet; currently it only sets
up an array, and sequentially reduces the final result on the host.
Another limitation of our current implementation is that it does not
support private reduction variables, because we haven't got around to
implementing the private clause yet. Finally, parts of the test cases
are commented out because support for those reduction operators isn't
complete.

Thomas, is this patch OK for gomp-4_0-branch?

Thanks,
Cesar

[-- Attachment #2: reduction-gomp4.diff --]
[-- Type: text/x-patch, Size: 37826 bytes --]

2014-07-06  Cesar Philippidis  <cesar@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>

	gcc/
	* omp-low.c (omp_get_id): New function.
	(lookup_reduction): New function.
	(maybe_lookup_reduction): New function.
	(build_outer_var_ref): Remove openacc assert.
	(new_omp_context): Preserve ctx->reduction_map.
	(scan_sharing_clauses): Handle OMP_CLAUSE_REDUCTION.
	(scan_oacc_offload): Initialize ctx->reduction_map.
	(lower_reduction_clauses): Handle OpenACC reductions.
	(omp_gimple_assign_with_ops): New function.
	(initialize_reduction_data): New function.
	(finalize_reduction_data): New function.
	(process_reduction_data): New function.
	(lower_oacc_offload): Handle reductions.
	* gcc/omp-builtins.def (BUILT_IN_OMP_SET_NUM_THREADS): New.

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Handle
	PRAGMA_OMP_CLAUSE_REDUCTION.
	(OACC_LOOP_CLAUSE_MASK, OACC_PARALLEL_CLAUSE_MASK): Add
	PRAGMA_OMP_CLAUSE_REDUCTION.

	gcc/fortran/
	* types.def (BT_FN_INT_INT): New.

	gcc/testsuite/
	* gcc/testsuite/c-c++-common/goacc/reduction-1.c: New test.
	* gcc/testsuite/c-c++-common/goacc/reduction-2.c: New test.
	* gcc/testsuite/c-c++-common/goacc/reduction-3.c: New test.
	* gcc/testsuite/c-c++-common/goacc/reduction-4.c: New test.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 03852b4..97cb866 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11332,6 +11332,10 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "present_or_create";
 	  break;
+	case PRAGMA_OMP_CLAUSE_REDUCTION:
+	  clauses = c_parser_omp_clause_reduction (parser, clauses);
+	  c_name = "reduction";
+	  break;
 	case PRAGMA_OMP_CLAUSE_SELF:
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "self";
@@ -11706,7 +11710,8 @@ c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
 */
 
 #define OACC_LOOP_CLAUSE_MASK						\
-	(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NONE)
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_COLLAPSE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_REDUCTION))
 
 static tree
 c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)
@@ -11746,6 +11751,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYIN)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYOUT)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRESENT_OR_CREATE)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_REDUCTION)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_VECTOR_LENGTH) )
 
 static tree
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 59ac4c3..f733d9d 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -86,6 +86,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_PTR, BT_PTR, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_INT, BT_VOID, BT_INT)
 DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR, BT_FN_VOID_PTR)
 
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 08b825c..419ec3a 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -236,3 +236,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
 		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
 		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
+
+DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
+		  BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index cd27b76..5b36f25 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -158,6 +158,11 @@ typedef struct omp_context
      construct.  In the case of a parallel, this is in the child function.  */
   tree block_vars;
 
+  /* A map of reduction pointer variables.  For accelerators, each
+     reduction variable is replaced with an array.  Each thread, in turn,
+     is assigned to a slot on that array.  */
+  splay_tree reduction_map;
+
   /* Label to which GOMP_cancel{,llation_point} and explicit and implicit
      barriers should jump to during omplower pass.  */
   tree cancel_label;
@@ -221,6 +226,17 @@ static tree scan_omp_1_op (tree *, int *, void *);
       *handled_ops_p = false; \
       break;
 
+/* Helper function to get the reduction array name */
+static const char *
+omp_get_id (tree node)
+{
+  const char *id = IDENTIFIER_POINTER (DECL_NAME (node));
+  int len = strlen ("omp$") + strlen (id);
+  char *temp_name = (char *)alloca (len+1);
+  snprintf (temp_name, len+1, "gfc$%s", id);
+  return IDENTIFIER_POINTER(get_identifier (temp_name));
+}
+
 /* Holds a decl for __OPENMP_TARGET__.  */
 static GTY(()) tree offload_symbol_decl;
 
@@ -873,6 +889,17 @@ lookup_sfield (tree var, omp_context *ctx)
 }
 
 static inline tree
+lookup_reduction (const char *id, omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  splay_tree_node n;
+  n = splay_tree_lookup (ctx->reduction_map,
+			 (splay_tree_key) id);
+  return (tree) n->value;
+}
+
+static inline tree
 maybe_lookup_field (tree var, omp_context *ctx)
 {
   splay_tree_node n;
@@ -880,6 +907,17 @@ maybe_lookup_field (tree var, omp_context *ctx)
   return n ? (tree) n->value : NULL_TREE;
 }
 
+static inline tree
+maybe_lookup_reduction (tree var, omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  splay_tree_node n;
+  n = splay_tree_lookup (ctx->reduction_map,
+			 (splay_tree_key) var);
+  return n ?(tree) n->value : NULL_TREE;
+}
+
 /* Return true if DECL should be copied by pointer.  SHARED_CTX is
    the parallel context if DECL is to be shared.  */
 
@@ -1036,8 +1074,6 @@ build_receiver_ref (tree var, bool by_ref, omp_context *ctx)
 static tree
 build_outer_var_ref (tree var, omp_context *ctx)
 {
-  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
-
   tree x;
 
   if (is_global_var (maybe_lookup_decl_in_outer_ctx (var, ctx)))
@@ -1379,6 +1415,8 @@ new_omp_context (gimple stmt, omp_context *outer_ctx)
       ctx->cb = outer_ctx->cb;
       ctx->cb.block = NULL;
       ctx->depth = outer_ctx->depth + 1;
+      /* FIXME: handle reductions recursively.  */
+      ctx->reduction_map = outer_ctx->reduction_map;
     }
   else
     {
@@ -1392,6 +1430,7 @@ new_omp_context (gimple stmt, omp_context *outer_ctx)
       ctx->cb.eh_lp_nr = 0;
       ctx->cb.transform_call_graph_edges = CB_CGE_MOVE;
       ctx->depth = 1;
+      //TODO ctx->reduction_map = TODO;
     }
 
   ctx->cb.decl_map = pointer_map_create ();
@@ -1588,7 +1627,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  /* FALLTHRU */
 
 	case OMP_CLAUSE_FIRSTPRIVATE:
-	case OMP_CLAUSE_REDUCTION:
 	  if (is_gimple_omp_oacc_specifically (ctx->stmt))
 	    {
 	      sorry ("clause not supported yet");
@@ -1596,6 +1634,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	    }
 	case OMP_CLAUSE_LINEAR:
 	  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
+	case OMP_CLAUSE_REDUCTION:
 	  decl = OMP_CLAUSE_DECL (c);
 	do_private:
 	  if (is_variable_sized (decl))
@@ -1621,6 +1660,28 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		install_var_field (decl, by_ref, 3, ctx);
 	    }
 	  install_var_local (decl, ctx);
+	  //TODO
+	  if (is_gimple_omp_oacc_specifically (ctx->stmt))
+	    {
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+	    {
+	      /* Create a decl for the reduction array.  */
+	      tree var = OMP_CLAUSE_DECL (c);
+	      tree ptype = build_pointer_type (TREE_TYPE (var));
+	      tree array = create_tmp_var (ptype, omp_get_id (var));
+	      omp_context *c = (ctx->field_map ? ctx : ctx->outer);
+	      install_var_field (array, true, 3, c);
+	      install_var_local (array, c);
+
+	      /* Insert it into the current context.  */
+	      splay_tree_insert (ctx->reduction_map,
+				 (splay_tree_key) omp_get_id(var),
+				 (splay_tree_value) array);
+	      splay_tree_insert (ctx->reduction_map,
+				 (splay_tree_key) array,
+				 (splay_tree_value) array);
+	    }
+	    }
 	  break;
 
 	case OMP_CLAUSE__LOOPTEMP_:
@@ -1658,10 +1719,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_NUM_WORKERS:
 	case OMP_CLAUSE_VECTOR_LENGTH:
 	  if (ctx->outer)
-	    {
-	      gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
 	    scan_omp_op (&OMP_CLAUSE_OPERAND (c, 0), ctx->outer);
-	    }
 	  break;
 
 	case OMP_CLAUSE_TO:
@@ -1750,7 +1808,16 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		      && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 		    install_var_field (decl, true, 7, ctx);
 		  else
-		    install_var_field (decl, true, 3, ctx);
+		    {
+		      if (!is_gimple_omp_oacc_specifically (ctx->stmt))
+			install_var_field (decl, true, 3, ctx);
+		      else
+		    {
+		      /* decl goes heres.  */
+		      omp_context *c = (ctx->field_map ? ctx : ctx->outer);
+		      install_var_field (decl, true, 3, c);
+		    }
+		    }
 		  if (is_gimple_omp_offloaded (ctx->stmt))
 		    install_var_local (decl, ctx);
 		}
@@ -1844,7 +1911,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  /* FALLTHRU */
 
 	case OMP_CLAUSE_FIRSTPRIVATE:
-	case OMP_CLAUSE_REDUCTION:
 	  if (is_gimple_omp_oacc_specifically (ctx->stmt))
 	    {
 	      sorry ("clause not supported yet");
@@ -1852,6 +1918,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	    }
 	case OMP_CLAUSE_LINEAR:
 	  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
+	case OMP_CLAUSE_REDUCTION:
 	case OMP_CLAUSE_PRIVATE:
 	  decl = OMP_CLAUSE_DECL (c);
 	  if (is_variable_sized (decl))
@@ -2161,6 +2228,7 @@ scan_oacc_offload (gimple stmt, omp_context *outer_ctx)
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx->record_type) = name;
   create_omp_child_function (ctx, false);
+  ctx->reduction_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
 
   gimple_omp_set_child_fn (stmt, ctx->cb.dst_fn);
 
@@ -4211,6 +4279,8 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 
       if (count == 1)
 	{
+	  if (!is_gimple_omp_oacc_specifically (ctx->stmt))
+	    {
 	  tree addr = build_fold_addr_expr_loc (clause_loc, ref);
 
 	  addr = save_expr (addr);
@@ -4219,6 +4289,117 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	  x = build2 (OMP_ATOMIC, void_type_node, addr, x);
 	  gimplify_and_add (x, stmt_seqp);
 	  return;
+	    }
+	  else
+	    {
+	  /* The atomic add at the end of the sum creates unnecessary
+	     write contention on accelerators.  To work around that,
+	     create an array or vector_length and assign an element to
+	     each thread.  Later, in lower_omp_for (for openacc), the
+	     values of array will be combined.  */
+
+	  tree t = NULL_TREE, array, nthreads;
+
+	  /* First ensure that the current tid is less than vector_length.  */
+	  tree exit_label = create_artificial_label (UNKNOWN_LOCATION);
+	  tree reduction_label = create_artificial_label (UNKNOWN_LOCATION);
+
+	  /* Get the current thread id.  */
+	  tree call = builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM);
+	  gimple stmt = gimple_build_call (call, 1, integer_zero_node);
+	  tree fntype = gimple_call_fntype (stmt);
+	  tree tid = create_tmp_var (TREE_TYPE (fntype), NULL);
+	  gimple_call_set_lhs (stmt, tid);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Find the total number of threads.  A reduction clause
+	     only appears inside a loop construction or a combined
+	     parallel and loop construct.  */
+	  tree c;
+
+	  if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR)
+	    c = gimple_oacc_parallel_clauses (ctx->outer->stmt);
+	  else
+	    c = gimple_oacc_parallel_clauses (ctx->stmt);
+
+	  t = find_omp_clause (c, OMP_CLAUSE_VECTOR_LENGTH);
+
+	  if (t)
+	    {
+	      t = fold_convert_loc (OMP_CLAUSE_LOCATION (t),
+				    integer_type_node,
+				    OMP_CLAUSE_VECTOR_LENGTH_EXPR (t));
+	    }
+
+	  if (!t)
+	    t = integer_one_node;
+
+	  /* Extract the number of threads.  */
+	  nthreads = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (nthreads, fold_build1 (NOP_EXPR, sizetype, t),
+			   stmt_seqp);
+	  stmt = gimple_build_assign_with_ops  (MINUS_EXPR, nthreads, nthreads,
+				 fold_build1 (NOP_EXPR, sizetype,
+					      integer_one_node));
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* If tid >= nthreads, goto exit_label.  */
+	  t = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (t, fold_build1 (NOP_EXPR, sizetype, tid),
+			   stmt_seqp);
+	  stmt = gimple_build_cond (GT_EXPR, t, nthreads, exit_label,
+				    reduction_label);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Place the reduction_label here.  */
+
+	  gimple_seq_add_stmt (stmt_seqp,
+			       gimple_build_label (reduction_label));
+
+	  /* Now insert the partial reductions into the array.  */
+
+	  /* Create an array for the reduction variable and install it
+	     in the parent scope.  */
+	  tree ptype = build_pointer_type (TREE_TYPE (var));
+
+	  t = lookup_reduction (omp_get_id (var), ctx);
+	  t = build_receiver_ref (t, false, ctx->outer);
+
+	  array = create_tmp_var (ptype, NULL);
+	  gimplify_assign (array, t, stmt_seqp);
+
+	  tree ptr = create_tmp_var (TREE_TYPE (array), NULL);
+
+	  /* Find the reduction array.  */
+
+	  /* testing a unary conversion.  */
+	  tree offset = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (offset, TYPE_SIZE_UNIT (TREE_TYPE (var)),
+			   stmt_seqp);
+	  t = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (t, unshare_expr (fold_build1 (NOP_EXPR, sizetype,
+							 tid)),
+			   stmt_seqp);
+	  stmt = gimple_build_assign_with_ops (MULT_EXPR, offset, offset, t);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Offset expression.  Does the POINTER_PLUS_EXPR take care
+	     of adding sizeof(var) to the array?  */
+	  ptr = create_tmp_var (ptype, NULL);
+	  stmt = gimple_build_assign_with_ops (POINTER_PLUS_EXPR,
+					       unshare_expr(ptr),
+					       array, offset);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Move the local sum to gfc$sum[i].  */
+	  x = unshare_expr (build_simple_mem_ref (ptr));
+	  stmt = gimplify_assign (x, new_var, stmt_seqp);
+
+	  /* Place exit label here.  */
+	  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (exit_label));
+
+	  return;
+	    }
 	}
 
       if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
@@ -9138,6 +9319,409 @@ make_pass_expand_omp (gcc::context *ctxt)
   return new pass_expand_omp (ctxt);
 }
 \f
+/* Helper function to preform, potentially COMPLEX_TYPE, operation and
+   convert it to gimple.  */
+static void
+omp_gimple_assign_with_ops (tree_code op, tree dest, tree src, gimple_seq *seq)
+{
+  gimple stmt;
+
+  if (TREE_CODE (TREE_TYPE (dest)) != COMPLEX_TYPE)
+    {
+      stmt = gimple_build_assign_with_ops (op, dest, dest, src);
+      gimple_seq_add_stmt (seq, stmt);
+      return;
+    }
+
+  tree t = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+  tree rdest = fold_build1 (REALPART_EXPR, TREE_TYPE (TREE_TYPE (dest)), dest);
+  gimplify_assign (t, rdest, seq);
+  rdest = t;
+
+  t = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+  tree idest = fold_build1 (IMAGPART_EXPR, TREE_TYPE (TREE_TYPE (dest)), dest);
+  gimplify_assign (t, idest, seq);
+  idest = t;
+
+  t = create_tmp_var (TREE_TYPE (TREE_TYPE (src)), NULL);
+  tree rsrc = fold_build1 (REALPART_EXPR, TREE_TYPE (TREE_TYPE (src)), src);
+  gimplify_assign (t, rsrc, seq);
+  rsrc = t;
+
+  t = create_tmp_var (TREE_TYPE (TREE_TYPE (src)), NULL);
+  tree isrc = fold_build1 (IMAGPART_EXPR, TREE_TYPE (TREE_TYPE (src)), src);
+  gimplify_assign (t, isrc, seq);
+  isrc = t;
+
+  tree r = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+  tree i = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+  tree result;
+
+  gcc_assert (op == PLUS_EXPR || op == MULT_EXPR);
+
+  if (op == PLUS_EXPR)
+    {
+      stmt = gimple_build_assign_with_ops (op, r, rdest, rsrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (op, i, idest, isrc);
+      gimple_seq_add_stmt (seq, stmt);
+    }
+  else if (op == MULT_EXPR)
+    {
+      /* Let x = a + ib = dest, y = c + id = src.
+	 x * y = (ac - bd) + i(ad + bc)  */
+      tree ac = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+      tree bd = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+      tree ad = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+      tree bc = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, ac, rdest, rsrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, bd, idest, isrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (MINUS_EXPR, r, ac, bd);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, ad, rdest, isrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, bd, idest, rsrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (PLUS_EXPR, i, ad, bc);
+      gimple_seq_add_stmt (seq, stmt);
+    }
+
+  result = build2 (COMPLEX_EXPR, TREE_TYPE (dest), r, i);
+  gimplify_assign (dest, result, seq);
+}
+
+/* Helper function to initialize local data for the reduction arrays.
+   The reduction arrays need to be placed inside the calling function
+   for accelerators, or else the host won't be able to preform the final
+   reduction.  FIXME: This function assumes that there are
+   vector_length threads in total.  */
+
+static void
+initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
+			   omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  tree c, t, oc;
+  gimple stmt;
+  omp_context *octx;
+  tree (*gimple_omp_clauses) (const_gimple);
+  void (*gimple_omp_set_clauses) (gimple, tree);
+
+  /* Find the innermost PARALLEL openmp context.  FIXME: OpenACC kernels
+     may require extra care unless they are converted to openmp for loops.  */
+
+  if (gimple_code (ctx->stmt) == GIMPLE_OACC_PARALLEL)
+    octx = ctx;
+  else
+    octx = ctx->outer;
+
+  gimple_omp_clauses = gimple_oacc_parallel_clauses;
+  gimple_omp_set_clauses = gimple_oacc_parallel_set_clauses;
+
+  /* Extract the clauses.  */
+  oc = gimple_omp_clauses (octx->stmt);
+
+  /* Find the last outer clause.  */
+  for (; oc && OMP_CLAUSE_CHAIN (oc); oc = OMP_CLAUSE_CHAIN (oc))
+    ;
+
+  /* Allocate arrays for each reduction variable.  */
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+	continue;
+
+      tree var = OMP_CLAUSE_DECL (c);
+      tree array = lookup_reduction (omp_get_id (var), ctx);
+      tree size, call;
+
+      /* Calculate size of the reduction array.  */
+      t = create_tmp_var (TREE_TYPE (nthreads), NULL);
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, t, nthreads,
+			 fold_convert (TREE_TYPE (nthreads),
+				       TYPE_SIZE_UNIT (TREE_TYPE (var))));
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      size = create_tmp_var (sizetype, NULL);
+      gimplify_assign (size, fold_build1 (NOP_EXPR, sizetype, t), stmt_seqp);
+
+      /* Now allocate memory for it.  FIXME: Allocating memory for the
+	 reduction array may be unnecessary once the final reduction is able
+	 to be preformed on the accelerator.  Instead of allocating memory on
+	 the host side, it could just be allocated on the accelerator.  */
+      call = unshare_expr (builtin_decl_explicit (BUILT_IN_ALLOCA));
+      stmt = gimple_build_call (call, 1, size);
+      gimple_call_set_lhs (stmt, array);
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      /* Map this array into the accelerator.  */
+
+      /* Add the reduction array to the list of clauses.  */
+      /* FIXME: Currently, these variables must be placed in the outer
+	 most clause so that copy-out works.  */
+      tree x = array;
+      t = build_omp_clause (gimple_location (ctx->stmt), OMP_CLAUSE_MAP);
+      OMP_CLAUSE_MAP_KIND (t) = OMP_CLAUSE_MAP_FORCE_FROM;
+      OMP_CLAUSE_DECL (t) = x;
+      OMP_CLAUSE_CHAIN (t) = NULL;
+      if (oc)
+	OMP_CLAUSE_CHAIN (oc) = t;
+      else
+	gimple_omp_set_clauses (octx->stmt, t);
+      OMP_CLAUSE_SIZE (t) = size;
+      oc = t;
+    }
+}
+
+/* Helper function to finalize local data for the reduction arrays. The
+   reduction array needs to be reduced to the original reduction variable.
+   FIXME: This function assumes that there are vector_length threads in
+   total.  Also, it assumes that there are at least vector_length iterations
+   in the for loop.  */
+
+static void
+finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
+			 omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  tree c, var, array, loop_header, loop_body, loop_exit;
+  gimple stmt;
+
+  /* Create for loop.
+
+     let var = the original reduction variable
+     let array = reduction variable array
+
+     var = array[0]
+     for (i = 1; i < nthreads; i++)
+       var op= array[i]
+ */
+
+  loop_header = create_artificial_label (UNKNOWN_LOCATION);
+  loop_body = create_artificial_label (UNKNOWN_LOCATION);
+  loop_exit = create_artificial_label (UNKNOWN_LOCATION);
+
+  /* Initialize the reduction variables to be value of the first array
+     element.  */
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+	continue;
+
+      tree_code reduction_code = OMP_CLAUSE_REDUCTION_CODE (c);
+
+      /* reduction(-:var) sums up the partial results, so it acts
+	 identically to reduction(+:var).  */
+      if (reduction_code == MINUS_EXPR)
+        reduction_code = PLUS_EXPR;
+
+      /* Set up reduction variable, var.  Becuase it's not gimple register,
+         it needs to be treated as a reference.  */
+      var = OMP_CLAUSE_DECL (c);
+
+      tree ptr = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
+
+      /* Extract array[ix] into mem.  */
+      tree mem = create_tmp_var (TREE_TYPE (var), NULL);
+      gimplify_assign (mem, build_simple_mem_ref (ptr), stmt_seqp);
+
+      /* Find the original reduction variable.  */
+      tree new_var = lookup_decl (var, ctx);
+      tree x = build_outer_var_ref (var, ctx);
+      if (is_reference (var))
+	new_var = build_simple_mem_ref (new_var);
+
+      x = lang_hooks.decls.omp_clause_assign_op (c, var, mem);
+      gimplify_and_add (unshare_expr(x), stmt_seqp);
+    }
+
+  /* Create an index variable and set it to one.  */
+  tree ix = create_tmp_var (sizetype, NULL);
+  gimplify_assign (ix, fold_build1 (NOP_EXPR, sizetype, integer_one_node),
+		   stmt_seqp);
+
+  /* Insert the loop header label here.  */
+  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_header));
+
+  /* Loop if ix >= nthreads.  */
+  tree x = create_tmp_var (sizetype, NULL);
+  gimplify_assign (x, fold_build1 (NOP_EXPR, sizetype, nthreads), stmt_seqp);
+  stmt = gimple_build_cond (GE_EXPR, ix, x, loop_exit, loop_body);
+  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+  /* Insert the loop body label here.  */
+  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_body));
+
+  /* Collapse each reduction array, one element at a time.  */
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+	continue;
+
+      tree_code reduction_code = OMP_CLAUSE_REDUCTION_CODE (c);
+
+      /* reduction(-:var) sums up the partial results, so it acts
+	 identically to reduction(+:var).  */
+      if (reduction_code == MINUS_EXPR)
+        reduction_code = PLUS_EXPR;
+
+      /* Set up reduction variable var.  */
+      var = OMP_CLAUSE_DECL (c);
+
+      array = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
+
+      /* Calculate the array offset.  */
+      tree offset = create_tmp_var (sizetype, NULL);
+      gimplify_assign (offset, TYPE_SIZE_UNIT (TREE_TYPE (var)), stmt_seqp);
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, offset, offset, ix);
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      tree ptr = create_tmp_var (TREE_TYPE (array), NULL);
+      stmt = gimple_build_assign_with_ops (POINTER_PLUS_EXPR, ptr, array,
+					   offset);
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      /* Extract array[ix] into mem.  */
+      tree mem = create_tmp_var (TREE_TYPE (var), NULL);
+      gimplify_assign (mem, build_simple_mem_ref (ptr), stmt_seqp);
+
+      /* Find the original reduction variable.  */
+      tree new_var = lookup_decl (var, ctx);
+      tree x = build_outer_var_ref (var, ctx);
+      if (is_reference (var))
+	new_var = build_simple_mem_ref (new_var);
+
+      tree t = create_tmp_var (TREE_TYPE (var), NULL);
+
+      x = lang_hooks.decls.omp_clause_assign_op (c, t, var);
+      gimplify_and_add (unshare_expr(x), stmt_seqp);
+
+      /* var = var op mem */
+      switch (OMP_CLAUSE_REDUCTION_CODE (c))
+	{
+	case TRUTH_ANDIF_EXPR:
+	case TRUTH_ORIF_EXPR:
+	  t = fold_build2 (OMP_CLAUSE_REDUCTION_CODE (c), integer_type_node,
+			   t, mem);
+	  gimplify_and_add (t, stmt_seqp);
+	  break;
+	default:
+	  /* The lhs isn't a gimple_reg when var is COMPLEX_TYPE.  */
+	  omp_gimple_assign_with_ops (OMP_CLAUSE_REDUCTION_CODE (c),
+				      t, mem, stmt_seqp);
+	}
+
+      t = fold_build1 (NOP_EXPR, TREE_TYPE (var), t);
+      x = lang_hooks.decls.omp_clause_assign_op (c, var, t);
+      gimplify_and_add (unshare_expr(x), stmt_seqp);
+    }
+
+  /* Increment the induction variable.  */
+  tree one = fold_build1 (NOP_EXPR, sizetype, integer_one_node);
+  stmt = gimple_build_assign_with_ops (PLUS_EXPR, ix, ix, one);
+  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+  /* Go back to the top of the loop.  */
+  gimple_seq_add_stmt (stmt_seqp, gimple_build_goto (loop_header));
+
+  /* Place the loop exit label here.  */
+  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_exit));
+}
+
+/* Scan through all of the gimple stmts searching for an OMP_FOR_EXPR, and
+   scan that for reductions.  */
+
+static void
+process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp,
+			gimple_seq *out_stmt_seqp, omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start (*body); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple stmt = gsi_stmt (gsi);
+      tree call;
+
+      switch (gimple_code (stmt))
+	{
+	case GIMPLE_OMP_FOR:
+	  tree clauses, nthreads, t;
+
+	  clauses = gimple_omp_for_clauses (stmt);
+	  ctx = maybe_lookup_ctx (stmt);
+	  t = NULL_TREE;
+
+	  /* The reduction clause may be nested inside a loop directive.
+	     Scan for the innermost vector_length clause.  */
+	  for (omp_context *oc = ctx; oc; oc = oc->outer)
+	    {
+	      tree c;
+
+	      switch (gimple_code (oc->stmt))
+		{
+		case GIMPLE_OACC_PARALLEL:
+		  c = gimple_oacc_parallel_clauses (oc->stmt);
+		  break;
+		case GIMPLE_OMP_FOR:
+		  c = gimple_omp_for_clauses (oc->stmt);
+		  break;
+		default:
+		  c = NULL_TREE;
+		  break;
+		}
+
+	      if (c && gimple_code (oc->stmt) == GIMPLE_OACC_PARALLEL)
+		{
+		  t = find_omp_clause (c, OMP_CLAUSE_VECTOR_LENGTH);
+		  if (t)
+		    t = fold_convert_loc (OMP_CLAUSE_LOCATION (t),
+					  integer_type_node,
+					  OMP_CLAUSE_VECTOR_LENGTH_EXPR (t));
+		  break;
+		}
+	    }
+
+	  if (!t)
+	    t = integer_one_node;
+
+	  /* Extract the number of threads.  */
+	  nthreads = create_tmp_var (TREE_TYPE (t), NULL);
+	  gimplify_assign (nthreads, t, in_stmt_seqp);
+
+	  /* Ensure nthreads >= 1.  */
+	  stmt = gimple_build_assign_with_ops (MAX_EXPR, nthreads, nthreads,
+				          fold_convert(TREE_TYPE (nthreads),
+						       integer_one_node));
+	  gimple_seq_add_stmt (in_stmt_seqp, stmt);
+
+	  /* Set the number of threads.  */
+	  call = builtin_decl_explicit (BUILT_IN_OMP_SET_NUM_THREADS);
+	  stmt = gimple_build_call (call, 1, nthreads);
+	  gimple_seq_add_stmt (in_stmt_seqp, stmt);
+
+	  initialize_reduction_data (clauses, nthreads, in_stmt_seqp, ctx);
+	  finalize_reduction_data (clauses, nthreads, out_stmt_seqp, ctx);
+	  break;
+	default:
+	  // Scan for other directives which support reduction here.
+	  break;
+	}
+    }
+}
+
 /* Routines to lower OpenMP directives into OMP-GIMPLE.  */
 
 /* Lower the OpenACC offload directive in the current statement
@@ -9150,7 +9734,7 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   tree child_fn, t, c;
   gimple stmt = gsi_stmt (*gsi_p);
   gimple par_bind, bind;
-  gimple_seq par_body, olist, ilist, new_body;
+  gimple_seq par_body, olist, ilist, orlist, irlist, new_body;
   location_t loc = gimple_location (stmt);
   unsigned int map_cnt = 0;
   tree (*gimple_omp_clauses) (const_gimple);
@@ -9176,6 +9760,10 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  irlist = NULL;
+  orlist = NULL;
+  process_reduction_data (&par_body, &irlist, &orlist, ctx);
+
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
     switch (OMP_CLAUSE_CODE (c))
       {
@@ -9330,7 +9918,8 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		    avar = build_fold_addr_expr (avar);
 		    gimplify_assign (x, avar, &ilist);
 		  }
-		else if (is_gimple_reg (var))
+		else if (is_gimple_reg (var)
+			 && !maybe_lookup_reduction (var, ctx))
 		  {
 		    tree avar = create_tmp_var (TREE_TYPE (var), NULL);
 		    mark_addressable (avar);
@@ -9355,7 +9944,8 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		  }
 		else
 		  {
-		    var = build_fold_addr_expr (var);
+		    if (!maybe_lookup_reduction (var, ctx))
+		      var = build_fold_addr_expr (var);
 		    gimplify_assign (x, var, &ilist);
 		  }
 	      }
@@ -9439,9 +10029,11 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind));
   gsi_replace (gsi_p, bind, true);
+  gimple_bind_add_seq (bind, irlist);
   gimple_bind_add_seq (bind, ilist);
   gimple_bind_add_stmt (bind, stmt);
   gimple_bind_add_seq (bind, olist);
+  gimple_bind_add_seq (bind, orlist);
 
   pop_gimplify_context (NULL);
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
new file mode 100644
index 0000000..cff7d2d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -0,0 +1,80 @@
+/* Integer reductions.  */
+
+#define vl 32
+
+int
+main(void)
+{
+  const int n = 1000;
+  int i;
+  int result, array[n];
+  int lresult;
+
+  /* '+' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+    result += array[i];
+#pragma acc end parallel
+
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+#pragma acc end parallel
+
+//   result = 0;
+//   vresult = 0;
+// 
+//   /* 'max' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result > array[i] ? result : array[i];
+// #pragma acc end parallel
+//
+//   /* 'min' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result < array[i] ? result : array[i];
+// #pragma acc end parallel
+
+  /* '&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&:result)
+  for (i = 0; i < n; i++)
+    result &= array[i];
+#pragma acc end parallel
+
+  /* '|' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (|:result)
+  for (i = 0; i < n; i++)
+    result |= array[i];
+#pragma acc end parallel
+
+  /* '^' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (^:result)
+  for (i = 0; i < n; i++)
+    result ^= array[i];
+#pragma acc end parallel
+
+  /* '&&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult && (result > array[i]);
+#pragma acc end parallel
+
+  /* '||' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult || (result > array[i]);
+#pragma acc end parallel
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-2.c b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
new file mode 100644
index 0000000..9686b37
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -0,0 +1,56 @@
+/* float reductions.  */
+
+#define vl 32
+
+int
+main(void)
+{
+  const int n = 1000;
+  int i;
+  float result, array[n];
+  int lresult;
+
+  /* '+' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+    result += array[i];
+#pragma acc end parallel
+
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+#pragma acc end parallel
+
+//   /* 'max' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result > array[i] ? result : array[i];
+// #pragma acc end parallel
+// 
+//   /* 'min' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result < array[i] ? result : array[i];
+// #pragma acc end parallel
+
+  /* '&&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult && (result > array[i]);
+#pragma acc end parallel
+
+  /* '||' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult || (result > array[i]);
+#pragma acc end parallel
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-3.c b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
new file mode 100644
index 0000000..c618c4e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -0,0 +1,56 @@
+/* double reductions.  */
+
+#define vl 32
+
+int
+main(void)
+{
+  const int n = 1000;
+  int i;
+  double result, array[n];
+  int lresult;
+
+  /* '+' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+    result += array[i];
+#pragma acc end parallel
+
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+#pragma acc end parallel
+
+//   /* 'max' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result > array[i] ? result : array[i];
+// #pragma acc end parallel
+// 
+//   /* 'min' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result < array[i] ? result : array[i];
+// #pragma acc end parallel
+
+  /* '&&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult && (result > array[i]);
+#pragma acc end parallel
+
+  /* '||' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult || (result > array[i]);
+#pragma acc end parallel
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-4.c b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
new file mode 100644
index 0000000..1e032a1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -0,0 +1,58 @@
+/* complex reductions.  */
+
+#define vl 32
+
+int
+main(void)
+{
+  const int n = 1000;
+  int i;
+  __complex__ double result, array[n];
+  int lresult;
+
+  /* '+' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+    result += array[i];
+#pragma acc end parallel
+
+  /* Needs support for complex multiplication.  */
+
+//   /* '*' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (*:result)
+//   for (i = 0; i < n; i++)
+//     result *= array[i];
+// #pragma acc end parallel
+//
+//   /* 'max' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result > array[i] ? result : array[i];
+// #pragma acc end parallel
+// 
+//   /* 'min' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result < array[i] ? result : array[i];
+// #pragma acc end parallel
+
+  /* '&&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult && (__real__(result) > __real__(array[i]));
+#pragma acc end parallel
+
+  /* '||' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult || (__real__(result) > __real__(array[i]));
+#pragma acc end parallel
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-06 23:11 [patch,gomp-4_0-branch] openacc parallel reduction part 1 Cesar Philippidis
@ 2014-07-07  9:55 ` Thomas Schwinge
  2014-07-08 14:28   ` Cesar Philippidis
  2014-07-28 17:06 ` Thomas Schwinge
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2014-07-07  9:55 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2664 bytes --]

Hi Cesar!

On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> This patch is the first step to enabling parallel reductions in openacc.

Thanks!

> As mentioned earlier, this patch isn't complete yet. For starters, parts
> of it depends on our internal ptx backend. I've temporarily remapped the
> ptx dependencies to their openmp equivalent, but without a proper
> openacc runtime this infrastructure won't do much.

For the curious: we're working on preparing our implementation of the
OpenACC Runtime Library for upstream submission; if only the weeks had
more days...

> Thomas, is this patch OK for gomp-4_0-branch?

I still :-( haven't managed to allocate the time for a proper review, but
given this doesn't regress any existing test cases, it's fine to commit,
and then we can take it from there.

A few minor comments:

> 2014-07-06  Cesar Philippidis  <cesar@codesourcery.com>
> 	    Thomas Schwinge  <thomas@codesourcery.com>

By the way, on gomp-4_0-branch, ChangeLog snippets go into the respective
ChangeLog.gomp files.

> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -11706,7 +11710,8 @@ c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
>  */
>  
>  #define OACC_LOOP_CLAUSE_MASK						\
> -	(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NONE)
> +	( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_COLLAPSE)		\

Not yet.  ;-)

> +	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_REDUCTION))

> --- a/gcc/fortran/types.def
> +++ b/gcc/fortran/types.def
> @@ -86,6 +86,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
>  DEF_FUNCTION_TYPE_1 (BT_FN_PTR_PTR, BT_PTR, BT_PTR)
>  DEF_FUNCTION_TYPE_1 (BT_FN_VOID_INT, BT_VOID, BT_INT)
>  DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
> +DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)

That one's not actually needed, because...

> --- a/gcc/omp-builtins.def
> +++ b/gcc/omp-builtins.def
> @@ -236,3 +236,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
>  		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
>  		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
> +
> +DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
> +		  BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)

... it's actually »void omp_set_num_threads (int)«, so BT_FN_VOID_INT.
As this is only temporary code, please add a FIXME comment here.  Hmm,
and I wonder, given this is using DEF_*GOMP*_BUILTIN, does this actually
do the right thing if -openmp is not specified?


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-07  9:55 ` Thomas Schwinge
@ 2014-07-08 14:28   ` Cesar Philippidis
  2014-07-08 17:02     ` Cesar Philippidis
                       ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Cesar Philippidis @ 2014-07-08 14:28 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2807 bytes --]

On 07/07/2014 02:55 AM, Thomas Schwinge wrote:

> On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
>> This patch is the first step to enabling parallel reductions in openacc.
> 
> Thanks!
> 
>> As mentioned earlier, this patch isn't complete yet. For starters, parts
>> of it depends on our internal ptx backend. I've temporarily remapped the
>> ptx dependencies to their openmp equivalent, but without a proper
>> openacc runtime this infrastructure won't do much.
> 
> For the curious: we're working on preparing our implementation of the
> OpenACC Runtime Library for upstream submission; if only the weeks had
> more days...
> 
>> Thomas, is this patch OK for gomp-4_0-branch?
> 
> I still :-( haven't managed to allocate the time for a proper review, but
> given this doesn't regress any existing test cases, it's fine to commit,
> and then we can take it from there.
> 
> A few minor comments:
> 
>> 2014-07-06  Cesar Philippidis  <cesar@codesourcery.com>
>> 	    Thomas Schwinge  <thomas@codesourcery.com>
> 
> By the way, on gomp-4_0-branch, ChangeLog snippets go into the respective
> ChangeLog.gomp files.
> 
>> --- a/gcc/c/c-parser.c
>> +++ b/gcc/c/c-parser.c
>> @@ -11706,7 +11710,8 @@ c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
>>  */
>>  
>>  #define OACC_LOOP_CLAUSE_MASK						\
>> -	(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NONE)
>> +	( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_COLLAPSE)		\
> 
> Not yet.  ;-)
> 
>> +	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_REDUCTION))
> 
>> --- a/gcc/fortran/types.def
>> +++ b/gcc/fortran/types.def
>> @@ -86,6 +86,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
>>  DEF_FUNCTION_TYPE_1 (BT_FN_PTR_PTR, BT_PTR, BT_PTR)
>>  DEF_FUNCTION_TYPE_1 (BT_FN_VOID_INT, BT_VOID, BT_INT)
>>  DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
>> +DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT, BT_INT, BT_INT)
> 
> That one's not actually needed, because...
> 
>> --- a/gcc/omp-builtins.def
>> +++ b/gcc/omp-builtins.def
>> @@ -236,3 +236,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
>>  		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
>>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
>>  		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
>> +
>> +DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
>> +		  BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
> 
> ... it's actually »void omp_set_num_threads (int)«, so BT_FN_VOID_INT.
> As this is only temporary code, please add a FIXME comment here.  Hmm,
> and I wonder, given this is using DEF_*GOMP*_BUILTIN, does this actually
> do the right thing if -openmp is not specified?

Thanks for catching those problems! I've committed this updated version
of the patch.

Cesar


[-- Attachment #2: gomp4-reductions-20140708.diff --]
[-- Type: text/x-patch, Size: 37274 bytes --]

2014-07-08  Cesar Philippidis  <cesar@codesourcery.com>
	    Thomas Schwinge  <thomas@codesourcery.com>

	gcc/
	* omp-low.c (omp_get_id): New function.
	(lookup_reduction): New function.
	(maybe_lookup_reduction): New function.
	(build_outer_var_ref): Remove openacc assert.
	(new_omp_context): Preserve ctx->reduction_map.
	(scan_sharing_clauses): Handle OMP_CLAUSE_REDUCTION.
	(scan_oacc_offload): Initialize ctx->reduction_map.
	(lower_reduction_clauses): Handle OpenACC reductions.
	(omp_gimple_assign_with_ops): New function.
	(initialize_reduction_data): New function.
	(finalize_reduction_data): New function.
	(process_reduction_data): New function.
	(lower_oacc_offload): Handle reductions.
	* gcc/omp-builtins.def (BUILT_IN_OMP_SET_NUM_THREADS): New.

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Handle
	PRAGMA_OMP_CLAUSE_REDUCTION.
	(OACC_LOOP_CLAUSE_MASK, OACC_PARALLEL_CLAUSE_MASK): Add
	PRAGMA_OMP_CLAUSE_REDUCTION.

	gcc/testsuite/
	* gcc/testsuite/c-c++-common/goacc/reduction-1.c: New test.
	* gcc/testsuite/c-c++-common/goacc/reduction-2.c: New test.
	* gcc/testsuite/c-c++-common/goacc/reduction-3.c: New test.
	* gcc/testsuite/c-c++-common/goacc/reduction-4.c: New test.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 03852b4..6a9271f 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11332,6 +11332,10 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "present_or_create";
 	  break;
+	case PRAGMA_OMP_CLAUSE_REDUCTION:
+	  clauses = c_parser_omp_clause_reduction (parser, clauses);
+	  c_name = "reduction";
+	  break;
 	case PRAGMA_OMP_CLAUSE_SELF:
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "self";
@@ -11706,7 +11710,7 @@ c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
 */
 
 #define OACC_LOOP_CLAUSE_MASK						\
-	(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NONE)
+	(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_REDUCTION)
 
 static tree
 c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)
@@ -11746,6 +11750,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYIN)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYOUT)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_PRESENT_OR_CREATE)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_REDUCTION)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_VECTOR_LENGTH) )
 
 static tree
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 08b825c..698dc79 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -236,3 +236,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
 		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
 		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
+
+DEF_GOMP_BUILTIN (BUILT_IN_OMP_SET_NUM_THREADS, "omp_set_num_threads",
+		  BT_FN_VOID_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index cd27b76..219d5fe 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -158,6 +158,11 @@ typedef struct omp_context
      construct.  In the case of a parallel, this is in the child function.  */
   tree block_vars;
 
+  /* A map of reduction pointer variables.  For accelerators, each
+     reduction variable is replaced with an array.  Each thread, in turn,
+     is assigned to a slot on that array.  */
+  splay_tree reduction_map;
+
   /* Label to which GOMP_cancel{,llation_point} and explicit and implicit
      barriers should jump to during omplower pass.  */
   tree cancel_label;
@@ -221,6 +226,17 @@ static tree scan_omp_1_op (tree *, int *, void *);
       *handled_ops_p = false; \
       break;
 
+/* Helper function to get the reduction array name */
+static const char *
+omp_get_id (tree node)
+{
+  const char *id = IDENTIFIER_POINTER (DECL_NAME (node));
+  int len = strlen ("omp$") + strlen (id);
+  char *temp_name = (char *)alloca (len+1);
+  snprintf (temp_name, len+1, "gfc$%s", id);
+  return IDENTIFIER_POINTER(get_identifier (temp_name));
+}
+
 /* Holds a decl for __OPENMP_TARGET__.  */
 static GTY(()) tree offload_symbol_decl;
 
@@ -873,6 +889,17 @@ lookup_sfield (tree var, omp_context *ctx)
 }
 
 static inline tree
+lookup_reduction (const char *id, omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  splay_tree_node n;
+  n = splay_tree_lookup (ctx->reduction_map,
+			 (splay_tree_key) id);
+  return (tree) n->value;
+}
+
+static inline tree
 maybe_lookup_field (tree var, omp_context *ctx)
 {
   splay_tree_node n;
@@ -880,6 +907,17 @@ maybe_lookup_field (tree var, omp_context *ctx)
   return n ? (tree) n->value : NULL_TREE;
 }
 
+static inline tree
+maybe_lookup_reduction (tree var, omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  splay_tree_node n;
+  n = splay_tree_lookup (ctx->reduction_map,
+			 (splay_tree_key) var);
+  return n ?(tree) n->value : NULL_TREE;
+}
+
 /* Return true if DECL should be copied by pointer.  SHARED_CTX is
    the parallel context if DECL is to be shared.  */
 
@@ -1036,8 +1074,6 @@ build_receiver_ref (tree var, bool by_ref, omp_context *ctx)
 static tree
 build_outer_var_ref (tree var, omp_context *ctx)
 {
-  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
-
   tree x;
 
   if (is_global_var (maybe_lookup_decl_in_outer_ctx (var, ctx)))
@@ -1379,6 +1415,8 @@ new_omp_context (gimple stmt, omp_context *outer_ctx)
       ctx->cb = outer_ctx->cb;
       ctx->cb.block = NULL;
       ctx->depth = outer_ctx->depth + 1;
+      /* FIXME: handle reductions recursively.  */
+      ctx->reduction_map = outer_ctx->reduction_map;
     }
   else
     {
@@ -1392,6 +1430,7 @@ new_omp_context (gimple stmt, omp_context *outer_ctx)
       ctx->cb.eh_lp_nr = 0;
       ctx->cb.transform_call_graph_edges = CB_CGE_MOVE;
       ctx->depth = 1;
+      //TODO ctx->reduction_map = TODO;
     }
 
   ctx->cb.decl_map = pointer_map_create ();
@@ -1588,7 +1627,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  /* FALLTHRU */
 
 	case OMP_CLAUSE_FIRSTPRIVATE:
-	case OMP_CLAUSE_REDUCTION:
 	  if (is_gimple_omp_oacc_specifically (ctx->stmt))
 	    {
 	      sorry ("clause not supported yet");
@@ -1596,6 +1634,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	    }
 	case OMP_CLAUSE_LINEAR:
 	  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
+	case OMP_CLAUSE_REDUCTION:
 	  decl = OMP_CLAUSE_DECL (c);
 	do_private:
 	  if (is_variable_sized (decl))
@@ -1621,6 +1660,28 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		install_var_field (decl, by_ref, 3, ctx);
 	    }
 	  install_var_local (decl, ctx);
+	  //TODO
+	  if (is_gimple_omp_oacc_specifically (ctx->stmt))
+	    {
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+	    {
+	      /* Create a decl for the reduction array.  */
+	      tree var = OMP_CLAUSE_DECL (c);
+	      tree ptype = build_pointer_type (TREE_TYPE (var));
+	      tree array = create_tmp_var (ptype, omp_get_id (var));
+	      omp_context *c = (ctx->field_map ? ctx : ctx->outer);
+	      install_var_field (array, true, 3, c);
+	      install_var_local (array, c);
+
+	      /* Insert it into the current context.  */
+	      splay_tree_insert (ctx->reduction_map,
+				 (splay_tree_key) omp_get_id(var),
+				 (splay_tree_value) array);
+	      splay_tree_insert (ctx->reduction_map,
+				 (splay_tree_key) array,
+				 (splay_tree_value) array);
+	    }
+	    }
 	  break;
 
 	case OMP_CLAUSE__LOOPTEMP_:
@@ -1658,10 +1719,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_NUM_WORKERS:
 	case OMP_CLAUSE_VECTOR_LENGTH:
 	  if (ctx->outer)
-	    {
-	      gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
 	    scan_omp_op (&OMP_CLAUSE_OPERAND (c, 0), ctx->outer);
-	    }
 	  break;
 
 	case OMP_CLAUSE_TO:
@@ -1750,7 +1808,16 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		      && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 		    install_var_field (decl, true, 7, ctx);
 		  else
-		    install_var_field (decl, true, 3, ctx);
+		    {
+		      if (!is_gimple_omp_oacc_specifically (ctx->stmt))
+			install_var_field (decl, true, 3, ctx);
+		      else
+		    {
+		      /* decl goes heres.  */
+		      omp_context *c = (ctx->field_map ? ctx : ctx->outer);
+		      install_var_field (decl, true, 3, c);
+		    }
+		    }
 		  if (is_gimple_omp_offloaded (ctx->stmt))
 		    install_var_local (decl, ctx);
 		}
@@ -1844,7 +1911,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  /* FALLTHRU */
 
 	case OMP_CLAUSE_FIRSTPRIVATE:
-	case OMP_CLAUSE_REDUCTION:
 	  if (is_gimple_omp_oacc_specifically (ctx->stmt))
 	    {
 	      sorry ("clause not supported yet");
@@ -1852,6 +1918,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	    }
 	case OMP_CLAUSE_LINEAR:
 	  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt));
+	case OMP_CLAUSE_REDUCTION:
 	case OMP_CLAUSE_PRIVATE:
 	  decl = OMP_CLAUSE_DECL (c);
 	  if (is_variable_sized (decl))
@@ -2161,6 +2228,7 @@ scan_oacc_offload (gimple stmt, omp_context *outer_ctx)
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx->record_type) = name;
   create_omp_child_function (ctx, false);
+  ctx->reduction_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
 
   gimple_omp_set_child_fn (stmt, ctx->cb.dst_fn);
 
@@ -4211,6 +4279,8 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 
       if (count == 1)
 	{
+	  if (!is_gimple_omp_oacc_specifically (ctx->stmt))
+	    {
 	  tree addr = build_fold_addr_expr_loc (clause_loc, ref);
 
 	  addr = save_expr (addr);
@@ -4219,6 +4289,117 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	  x = build2 (OMP_ATOMIC, void_type_node, addr, x);
 	  gimplify_and_add (x, stmt_seqp);
 	  return;
+	    }
+	  else
+	    {
+	  /* The atomic add at the end of the sum creates unnecessary
+	     write contention on accelerators.  To work around that,
+	     create an array or vector_length and assign an element to
+	     each thread.  Later, in lower_omp_for (for openacc), the
+	     values of array will be combined.  */
+
+	  tree t = NULL_TREE, array, nthreads;
+
+	  /* First ensure that the current tid is less than vector_length.  */
+	  tree exit_label = create_artificial_label (UNKNOWN_LOCATION);
+	  tree reduction_label = create_artificial_label (UNKNOWN_LOCATION);
+
+	  /* Get the current thread id.  */
+	  tree call = builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM);
+	  gimple stmt = gimple_build_call (call, 1, integer_zero_node);
+	  tree fntype = gimple_call_fntype (stmt);
+	  tree tid = create_tmp_var (TREE_TYPE (fntype), NULL);
+	  gimple_call_set_lhs (stmt, tid);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Find the total number of threads.  A reduction clause
+	     only appears inside a loop construction or a combined
+	     parallel and loop construct.  */
+	  tree c;
+
+	  if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR)
+	    c = gimple_oacc_parallel_clauses (ctx->outer->stmt);
+	  else
+	    c = gimple_oacc_parallel_clauses (ctx->stmt);
+
+	  t = find_omp_clause (c, OMP_CLAUSE_VECTOR_LENGTH);
+
+	  if (t)
+	    {
+	      t = fold_convert_loc (OMP_CLAUSE_LOCATION (t),
+				    integer_type_node,
+				    OMP_CLAUSE_VECTOR_LENGTH_EXPR (t));
+	    }
+
+	  if (!t)
+	    t = integer_one_node;
+
+	  /* Extract the number of threads.  */
+	  nthreads = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (nthreads, fold_build1 (NOP_EXPR, sizetype, t),
+			   stmt_seqp);
+	  stmt = gimple_build_assign_with_ops  (MINUS_EXPR, nthreads, nthreads,
+				 fold_build1 (NOP_EXPR, sizetype,
+					      integer_one_node));
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* If tid >= nthreads, goto exit_label.  */
+	  t = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (t, fold_build1 (NOP_EXPR, sizetype, tid),
+			   stmt_seqp);
+	  stmt = gimple_build_cond (GT_EXPR, t, nthreads, exit_label,
+				    reduction_label);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Place the reduction_label here.  */
+
+	  gimple_seq_add_stmt (stmt_seqp,
+			       gimple_build_label (reduction_label));
+
+	  /* Now insert the partial reductions into the array.  */
+
+	  /* Create an array for the reduction variable and install it
+	     in the parent scope.  */
+	  tree ptype = build_pointer_type (TREE_TYPE (var));
+
+	  t = lookup_reduction (omp_get_id (var), ctx);
+	  t = build_receiver_ref (t, false, ctx->outer);
+
+	  array = create_tmp_var (ptype, NULL);
+	  gimplify_assign (array, t, stmt_seqp);
+
+	  tree ptr = create_tmp_var (TREE_TYPE (array), NULL);
+
+	  /* Find the reduction array.  */
+
+	  /* testing a unary conversion.  */
+	  tree offset = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (offset, TYPE_SIZE_UNIT (TREE_TYPE (var)),
+			   stmt_seqp);
+	  t = create_tmp_var (sizetype, NULL);
+	  gimplify_assign (t, unshare_expr (fold_build1 (NOP_EXPR, sizetype,
+							 tid)),
+			   stmt_seqp);
+	  stmt = gimple_build_assign_with_ops (MULT_EXPR, offset, offset, t);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Offset expression.  Does the POINTER_PLUS_EXPR take care
+	     of adding sizeof(var) to the array?  */
+	  ptr = create_tmp_var (ptype, NULL);
+	  stmt = gimple_build_assign_with_ops (POINTER_PLUS_EXPR,
+					       unshare_expr(ptr),
+					       array, offset);
+	  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+	  /* Move the local sum to gfc$sum[i].  */
+	  x = unshare_expr (build_simple_mem_ref (ptr));
+	  stmt = gimplify_assign (x, new_var, stmt_seqp);
+
+	  /* Place exit label here.  */
+	  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (exit_label));
+
+	  return;
+	    }
 	}
 
       if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
@@ -9138,6 +9319,410 @@ make_pass_expand_omp (gcc::context *ctxt)
   return new pass_expand_omp (ctxt);
 }
 \f
+/* Helper function to preform, potentially COMPLEX_TYPE, operation and
+   convert it to gimple.  */
+static void
+omp_gimple_assign_with_ops (tree_code op, tree dest, tree src, gimple_seq *seq)
+{
+  gimple stmt;
+
+  if (TREE_CODE (TREE_TYPE (dest)) != COMPLEX_TYPE)
+    {
+      stmt = gimple_build_assign_with_ops (op, dest, dest, src);
+      gimple_seq_add_stmt (seq, stmt);
+      return;
+    }
+
+  tree t = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+  tree rdest = fold_build1 (REALPART_EXPR, TREE_TYPE (TREE_TYPE (dest)), dest);
+  gimplify_assign (t, rdest, seq);
+  rdest = t;
+
+  t = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+  tree idest = fold_build1 (IMAGPART_EXPR, TREE_TYPE (TREE_TYPE (dest)), dest);
+  gimplify_assign (t, idest, seq);
+  idest = t;
+
+  t = create_tmp_var (TREE_TYPE (TREE_TYPE (src)), NULL);
+  tree rsrc = fold_build1 (REALPART_EXPR, TREE_TYPE (TREE_TYPE (src)), src);
+  gimplify_assign (t, rsrc, seq);
+  rsrc = t;
+
+  t = create_tmp_var (TREE_TYPE (TREE_TYPE (src)), NULL);
+  tree isrc = fold_build1 (IMAGPART_EXPR, TREE_TYPE (TREE_TYPE (src)), src);
+  gimplify_assign (t, isrc, seq);
+  isrc = t;
+
+  tree r = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+  tree i = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+  tree result;
+
+  gcc_assert (op == PLUS_EXPR || op == MULT_EXPR);
+
+  if (op == PLUS_EXPR)
+    {
+      stmt = gimple_build_assign_with_ops (op, r, rdest, rsrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (op, i, idest, isrc);
+      gimple_seq_add_stmt (seq, stmt);
+    }
+  else if (op == MULT_EXPR)
+    {
+      /* Let x = a + ib = dest, y = c + id = src.
+	 x * y = (ac - bd) + i(ad + bc)  */
+      tree ac = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+      tree bd = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+      tree ad = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+      tree bc = create_tmp_var (TREE_TYPE (TREE_TYPE (dest)), NULL);
+
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, ac, rdest, rsrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, bd, idest, isrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (MINUS_EXPR, r, ac, bd);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, ad, rdest, isrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, bd, idest, rsrc);
+      gimple_seq_add_stmt (seq, stmt);
+
+      stmt = gimple_build_assign_with_ops (PLUS_EXPR, i, ad, bc);
+      gimple_seq_add_stmt (seq, stmt);
+    }
+
+  result = build2 (COMPLEX_EXPR, TREE_TYPE (dest), r, i);
+  gimplify_assign (dest, result, seq);
+}
+
+/* Helper function to initialize local data for the reduction arrays.
+   The reduction arrays need to be placed inside the calling function
+   for accelerators, or else the host won't be able to preform the final
+   reduction.  FIXME: This function assumes that there are
+   vector_length threads in total.  */
+
+static void
+initialize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
+			   omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  tree c, t, oc;
+  gimple stmt;
+  omp_context *octx;
+  tree (*gimple_omp_clauses) (const_gimple);
+  void (*gimple_omp_set_clauses) (gimple, tree);
+
+  /* Find the innermost PARALLEL openmp context.  FIXME: OpenACC kernels
+     may require extra care unless they are converted to openmp for loops.  */
+
+  if (gimple_code (ctx->stmt) == GIMPLE_OACC_PARALLEL)
+    octx = ctx;
+  else
+    octx = ctx->outer;
+
+  gimple_omp_clauses = gimple_oacc_parallel_clauses;
+  gimple_omp_set_clauses = gimple_oacc_parallel_set_clauses;
+
+  /* Extract the clauses.  */
+  oc = gimple_omp_clauses (octx->stmt);
+
+  /* Find the last outer clause.  */
+  for (; oc && OMP_CLAUSE_CHAIN (oc); oc = OMP_CLAUSE_CHAIN (oc))
+    ;
+
+  /* Allocate arrays for each reduction variable.  */
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+	continue;
+
+      tree var = OMP_CLAUSE_DECL (c);
+      tree array = lookup_reduction (omp_get_id (var), ctx);
+      tree size, call;
+
+      /* Calculate size of the reduction array.  */
+      t = create_tmp_var (TREE_TYPE (nthreads), NULL);
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, t, nthreads,
+			 fold_convert (TREE_TYPE (nthreads),
+				       TYPE_SIZE_UNIT (TREE_TYPE (var))));
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      size = create_tmp_var (sizetype, NULL);
+      gimplify_assign (size, fold_build1 (NOP_EXPR, sizetype, t), stmt_seqp);
+
+      /* Now allocate memory for it.  FIXME: Allocating memory for the
+	 reduction array may be unnecessary once the final reduction is able
+	 to be preformed on the accelerator.  Instead of allocating memory on
+	 the host side, it could just be allocated on the accelerator.  */
+      call = unshare_expr (builtin_decl_explicit (BUILT_IN_ALLOCA));
+      stmt = gimple_build_call (call, 1, size);
+      gimple_call_set_lhs (stmt, array);
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      /* Map this array into the accelerator.  */
+
+      /* Add the reduction array to the list of clauses.  */
+      /* FIXME: Currently, these variables must be placed in the outer
+	 most clause so that copy-out works.  */
+      tree x = array;
+      t = build_omp_clause (gimple_location (ctx->stmt), OMP_CLAUSE_MAP);
+      OMP_CLAUSE_MAP_KIND (t) = OMP_CLAUSE_MAP_FORCE_FROM;
+      OMP_CLAUSE_DECL (t) = x;
+      OMP_CLAUSE_CHAIN (t) = NULL;
+      if (oc)
+	OMP_CLAUSE_CHAIN (oc) = t;
+      else
+	gimple_omp_set_clauses (octx->stmt, t);
+      OMP_CLAUSE_SIZE (t) = size;
+      oc = t;
+    }
+}
+
+/* Helper function to finalize local data for the reduction arrays. The
+   reduction array needs to be reduced to the original reduction variable.
+   FIXME: This function assumes that there are vector_length threads in
+   total.  Also, it assumes that there are at least vector_length iterations
+   in the for loop.  */
+
+static void
+finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
+			 omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  tree c, var, array, loop_header, loop_body, loop_exit;
+  gimple stmt;
+
+  /* Create for loop.
+
+     let var = the original reduction variable
+     let array = reduction variable array
+
+     var = array[0]
+     for (i = 1; i < nthreads; i++)
+       var op= array[i]
+ */
+
+  loop_header = create_artificial_label (UNKNOWN_LOCATION);
+  loop_body = create_artificial_label (UNKNOWN_LOCATION);
+  loop_exit = create_artificial_label (UNKNOWN_LOCATION);
+
+  /* Initialize the reduction variables to be value of the first array
+     element.  */
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+	continue;
+
+      tree_code reduction_code = OMP_CLAUSE_REDUCTION_CODE (c);
+
+      /* reduction(-:var) sums up the partial results, so it acts
+	 identically to reduction(+:var).  */
+      if (reduction_code == MINUS_EXPR)
+        reduction_code = PLUS_EXPR;
+
+      /* Set up reduction variable, var.  Becuase it's not gimple register,
+         it needs to be treated as a reference.  */
+      var = OMP_CLAUSE_DECL (c);
+
+      tree ptr = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
+
+      /* Extract array[ix] into mem.  */
+      tree mem = create_tmp_var (TREE_TYPE (var), NULL);
+      gimplify_assign (mem, build_simple_mem_ref (ptr), stmt_seqp);
+
+      /* Find the original reduction variable.  */
+      tree new_var = lookup_decl (var, ctx);
+      tree x = build_outer_var_ref (var, ctx);
+      if (is_reference (var))
+	new_var = build_simple_mem_ref (new_var);
+
+      x = lang_hooks.decls.omp_clause_assign_op (c, var, mem);
+      gimplify_and_add (unshare_expr(x), stmt_seqp);
+    }
+
+  /* Create an index variable and set it to one.  */
+  tree ix = create_tmp_var (sizetype, NULL);
+  gimplify_assign (ix, fold_build1 (NOP_EXPR, sizetype, integer_one_node),
+		   stmt_seqp);
+
+  /* Insert the loop header label here.  */
+  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_header));
+
+  /* Loop if ix >= nthreads.  */
+  tree x = create_tmp_var (sizetype, NULL);
+  gimplify_assign (x, fold_build1 (NOP_EXPR, sizetype, nthreads), stmt_seqp);
+  stmt = gimple_build_cond (GE_EXPR, ix, x, loop_exit, loop_body);
+  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+  /* Insert the loop body label here.  */
+  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_body));
+
+  /* Collapse each reduction array, one element at a time.  */
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+	continue;
+
+      tree_code reduction_code = OMP_CLAUSE_REDUCTION_CODE (c);
+
+      /* reduction(-:var) sums up the partial results, so it acts
+	 identically to reduction(+:var).  */
+      if (reduction_code == MINUS_EXPR)
+        reduction_code = PLUS_EXPR;
+
+      /* Set up reduction variable var.  */
+      var = OMP_CLAUSE_DECL (c);
+
+      array = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
+
+      /* Calculate the array offset.  */
+      tree offset = create_tmp_var (sizetype, NULL);
+      gimplify_assign (offset, TYPE_SIZE_UNIT (TREE_TYPE (var)), stmt_seqp);
+      stmt = gimple_build_assign_with_ops (MULT_EXPR, offset, offset, ix);
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      tree ptr = create_tmp_var (TREE_TYPE (array), NULL);
+      stmt = gimple_build_assign_with_ops (POINTER_PLUS_EXPR, ptr, array,
+					   offset);
+      gimple_seq_add_stmt (stmt_seqp, stmt);
+
+      /* Extract array[ix] into mem.  */
+      tree mem = create_tmp_var (TREE_TYPE (var), NULL);
+      gimplify_assign (mem, build_simple_mem_ref (ptr), stmt_seqp);
+
+      /* Find the original reduction variable.  */
+      tree new_var = lookup_decl (var, ctx);
+      tree x = build_outer_var_ref (var, ctx);
+      if (is_reference (var))
+	new_var = build_simple_mem_ref (new_var);
+
+      tree t = create_tmp_var (TREE_TYPE (var), NULL);
+
+      x = lang_hooks.decls.omp_clause_assign_op (c, t, var);
+      gimplify_and_add (unshare_expr(x), stmt_seqp);
+
+      /* var = var op mem */
+      switch (OMP_CLAUSE_REDUCTION_CODE (c))
+	{
+	case TRUTH_ANDIF_EXPR:
+	case TRUTH_ORIF_EXPR:
+	  t = fold_build2 (OMP_CLAUSE_REDUCTION_CODE (c), integer_type_node,
+			   t, mem);
+	  gimplify_and_add (t, stmt_seqp);
+	  break;
+	default:
+	  /* The lhs isn't a gimple_reg when var is COMPLEX_TYPE.  */
+	  omp_gimple_assign_with_ops (OMP_CLAUSE_REDUCTION_CODE (c),
+				      t, mem, stmt_seqp);
+	}
+
+      t = fold_build1 (NOP_EXPR, TREE_TYPE (var), t);
+      x = lang_hooks.decls.omp_clause_assign_op (c, var, t);
+      gimplify_and_add (unshare_expr(x), stmt_seqp);
+    }
+
+  /* Increment the induction variable.  */
+  tree one = fold_build1 (NOP_EXPR, sizetype, integer_one_node);
+  stmt = gimple_build_assign_with_ops (PLUS_EXPR, ix, ix, one);
+  gimple_seq_add_stmt (stmt_seqp, stmt);
+
+  /* Go back to the top of the loop.  */
+  gimple_seq_add_stmt (stmt_seqp, gimple_build_goto (loop_header));
+
+  /* Place the loop exit label here.  */
+  gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_exit));
+}
+
+/* Scan through all of the gimple stmts searching for an OMP_FOR_EXPR, and
+   scan that for reductions.  */
+
+static void
+process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp,
+			gimple_seq *out_stmt_seqp, omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start (*body); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple stmt = gsi_stmt (gsi);
+      tree call;
+
+      switch (gimple_code (stmt))
+	{
+	case GIMPLE_OMP_FOR:
+	  tree clauses, nthreads, t;
+
+	  clauses = gimple_omp_for_clauses (stmt);
+	  ctx = maybe_lookup_ctx (stmt);
+	  t = NULL_TREE;
+
+	  /* The reduction clause may be nested inside a loop directive.
+	     Scan for the innermost vector_length clause.  */
+	  for (omp_context *oc = ctx; oc; oc = oc->outer)
+	    {
+	      tree c;
+
+	      switch (gimple_code (oc->stmt))
+		{
+		case GIMPLE_OACC_PARALLEL:
+		  c = gimple_oacc_parallel_clauses (oc->stmt);
+		  break;
+		case GIMPLE_OMP_FOR:
+		  c = gimple_omp_for_clauses (oc->stmt);
+		  break;
+		default:
+		  c = NULL_TREE;
+		  break;
+		}
+
+	      if (c && gimple_code (oc->stmt) == GIMPLE_OACC_PARALLEL)
+		{
+		  t = find_omp_clause (c, OMP_CLAUSE_VECTOR_LENGTH);
+		  if (t)
+		    t = fold_convert_loc (OMP_CLAUSE_LOCATION (t),
+					  integer_type_node,
+					  OMP_CLAUSE_VECTOR_LENGTH_EXPR (t));
+		  break;
+		}
+	    }
+
+	  if (!t)
+	    t = integer_one_node;
+
+	  /* Extract the number of threads.  */
+	  nthreads = create_tmp_var (TREE_TYPE (t), NULL);
+	  gimplify_assign (nthreads, t, in_stmt_seqp);
+
+	  /* Ensure nthreads >= 1.  */
+	  stmt = gimple_build_assign_with_ops (MAX_EXPR, nthreads, nthreads,
+				          fold_convert(TREE_TYPE (nthreads),
+						       integer_one_node));
+	  gimple_seq_add_stmt (in_stmt_seqp, stmt);
+
+	  /* Set the number of threads.  */
+	  /* FIXME: This needs to handle accelerators  */
+	  call = builtin_decl_explicit (BUILT_IN_OMP_SET_NUM_THREADS);
+	  stmt = gimple_build_call (call, 1, nthreads);
+	  gimple_seq_add_stmt (in_stmt_seqp, stmt);
+
+	  initialize_reduction_data (clauses, nthreads, in_stmt_seqp, ctx);
+	  finalize_reduction_data (clauses, nthreads, out_stmt_seqp, ctx);
+	  break;
+	default:
+	  // Scan for other directives which support reduction here.
+	  break;
+	}
+    }
+}
+
 /* Routines to lower OpenMP directives into OMP-GIMPLE.  */
 
 /* Lower the OpenACC offload directive in the current statement
@@ -9150,7 +9735,7 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   tree child_fn, t, c;
   gimple stmt = gsi_stmt (*gsi_p);
   gimple par_bind, bind;
-  gimple_seq par_body, olist, ilist, new_body;
+  gimple_seq par_body, olist, ilist, orlist, irlist, new_body;
   location_t loc = gimple_location (stmt);
   unsigned int map_cnt = 0;
   tree (*gimple_omp_clauses) (const_gimple);
@@ -9176,6 +9761,10 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  irlist = NULL;
+  orlist = NULL;
+  process_reduction_data (&par_body, &irlist, &orlist, ctx);
+
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
     switch (OMP_CLAUSE_CODE (c))
       {
@@ -9330,7 +9919,8 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		    avar = build_fold_addr_expr (avar);
 		    gimplify_assign (x, avar, &ilist);
 		  }
-		else if (is_gimple_reg (var))
+		else if (is_gimple_reg (var)
+			 && !maybe_lookup_reduction (var, ctx))
 		  {
 		    tree avar = create_tmp_var (TREE_TYPE (var), NULL);
 		    mark_addressable (avar);
@@ -9355,7 +9945,8 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		  }
 		else
 		  {
-		    var = build_fold_addr_expr (var);
+		    if (!maybe_lookup_reduction (var, ctx))
+		      var = build_fold_addr_expr (var);
 		    gimplify_assign (x, var, &ilist);
 		  }
 	      }
@@ -9439,9 +10030,11 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind));
   gsi_replace (gsi_p, bind, true);
+  gimple_bind_add_seq (bind, irlist);
   gimple_bind_add_seq (bind, ilist);
   gimple_bind_add_stmt (bind, stmt);
   gimple_bind_add_seq (bind, olist);
+  gimple_bind_add_seq (bind, orlist);
 
   pop_gimplify_context (NULL);
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
new file mode 100644
index 0000000..cff7d2d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -0,0 +1,80 @@
+/* Integer reductions.  */
+
+#define vl 32
+
+int
+main(void)
+{
+  const int n = 1000;
+  int i;
+  int result, array[n];
+  int lresult;
+
+  /* '+' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+    result += array[i];
+#pragma acc end parallel
+
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+#pragma acc end parallel
+
+//   result = 0;
+//   vresult = 0;
+// 
+//   /* 'max' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result > array[i] ? result : array[i];
+// #pragma acc end parallel
+//
+//   /* 'min' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result < array[i] ? result : array[i];
+// #pragma acc end parallel
+
+  /* '&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&:result)
+  for (i = 0; i < n; i++)
+    result &= array[i];
+#pragma acc end parallel
+
+  /* '|' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (|:result)
+  for (i = 0; i < n; i++)
+    result |= array[i];
+#pragma acc end parallel
+
+  /* '^' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (^:result)
+  for (i = 0; i < n; i++)
+    result ^= array[i];
+#pragma acc end parallel
+
+  /* '&&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult && (result > array[i]);
+#pragma acc end parallel
+
+  /* '||' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult || (result > array[i]);
+#pragma acc end parallel
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-2.c b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
new file mode 100644
index 0000000..9686b37
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -0,0 +1,56 @@
+/* float reductions.  */
+
+#define vl 32
+
+int
+main(void)
+{
+  const int n = 1000;
+  int i;
+  float result, array[n];
+  int lresult;
+
+  /* '+' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+    result += array[i];
+#pragma acc end parallel
+
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+#pragma acc end parallel
+
+//   /* 'max' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result > array[i] ? result : array[i];
+// #pragma acc end parallel
+// 
+//   /* 'min' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result < array[i] ? result : array[i];
+// #pragma acc end parallel
+
+  /* '&&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult && (result > array[i]);
+#pragma acc end parallel
+
+  /* '||' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult || (result > array[i]);
+#pragma acc end parallel
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-3.c b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
new file mode 100644
index 0000000..c618c4e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -0,0 +1,56 @@
+/* double reductions.  */
+
+#define vl 32
+
+int
+main(void)
+{
+  const int n = 1000;
+  int i;
+  double result, array[n];
+  int lresult;
+
+  /* '+' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+    result += array[i];
+#pragma acc end parallel
+
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+#pragma acc end parallel
+
+//   /* 'max' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result > array[i] ? result : array[i];
+// #pragma acc end parallel
+// 
+//   /* 'min' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result < array[i] ? result : array[i];
+// #pragma acc end parallel
+
+  /* '&&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult && (result > array[i]);
+#pragma acc end parallel
+
+  /* '||' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult || (result > array[i]);
+#pragma acc end parallel
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-4.c b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
new file mode 100644
index 0000000..1e032a1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -0,0 +1,58 @@
+/* complex reductions.  */
+
+#define vl 32
+
+int
+main(void)
+{
+  const int n = 1000;
+  int i;
+  __complex__ double result, array[n];
+  int lresult;
+
+  /* '+' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+    result += array[i];
+#pragma acc end parallel
+
+  /* Needs support for complex multiplication.  */
+
+//   /* '*' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (*:result)
+//   for (i = 0; i < n; i++)
+//     result *= array[i];
+// #pragma acc end parallel
+//
+//   /* 'max' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result > array[i] ? result : array[i];
+// #pragma acc end parallel
+// 
+//   /* 'min' reductions.  */
+// #pragma acc parallel vector_length (vl)
+// #pragma acc loop reduction (+:result)
+//   for (i = 0; i < n; i++)
+//       result = result < array[i] ? result : array[i];
+// #pragma acc end parallel
+
+  /* '&&' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult && (__real__(result) > __real__(array[i]));
+#pragma acc end parallel
+
+  /* '||' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    lresult = lresult || (__real__(result) > __real__(array[i]));
+#pragma acc end parallel
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-08 14:28   ` Cesar Philippidis
@ 2014-07-08 17:02     ` Cesar Philippidis
  2014-09-25 20:06     ` Thomas Schwinge
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Cesar Philippidis @ 2014-07-08 17:02 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 398 bytes --]

On 07/08/2014 07:28 AM, Cesar Philippidis wrote:

> Thanks for catching those problems! I've committed this updated version
> of the patch.

I forgot to remove the support for the collapse clause in from the loop
construct in the c frontend. I did so upstream, but not internally. I've
checked in this patch which fixes that.

Thomas, I don't know how you manage so many different branches.

Cesar

[-- Attachment #2: reduction-collpase-removal.diff --]
[-- Type: text/x-patch, Size: 669 bytes --]

2014-07-08  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/c/
	*c-parser.c (OACC_LOOP_CLAUSE_MASK): Remove
	PRAGMA_OMP_CLAUSE_COLLAPSE from theh mask.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 66d5444..fb7e12d 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11906,8 +11906,7 @@ c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
 */
 
 #define OACC_LOOP_CLAUSE_MASK						\
-	( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_COLLAPSE)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_REDUCTION))
+	(OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_REDUCTION)
 
 static tree
 c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-06 23:11 [patch,gomp-4_0-branch] openacc parallel reduction part 1 Cesar Philippidis
  2014-07-07  9:55 ` Thomas Schwinge
@ 2014-07-28 17:06 ` Thomas Schwinge
  2014-07-28 17:54   ` Cesar Philippidis
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2014-07-28 17:06 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4030 bytes --]

Hi Cesar!

On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> This patch is the first step to enabling parallel reductions in openacc.

I think I have found one issue in this code -- but please verify that my
understanding of reductions is correct.  Namely:

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> +/* Helper function to finalize local data for the reduction arrays. The
> +   reduction array needs to be reduced to the original reduction variable.
> +   FIXME: This function assumes that there are vector_length threads in
> +   total.  Also, it assumes that there are at least vector_length iterations
> +   in the for loop.  */
> +
> +static void
> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
> +			 omp_context *ctx)
> +{
> +  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
> +
> +  tree c, var, array, loop_header, loop_body, loop_exit;
> +  gimple stmt;
> +
> +  /* Create for loop.
> +
> +     let var = the original reduction variable
> +     let array = reduction variable array
> +
> +     var = array[0]
> +     for (i = 1; i < nthreads; i++)
> +       var op= array[i]
> + */

This should also consider the reduction variable's original value.  Test
case (which does the expected thing if modified for OpenMP):

    #include <stdlib.h>
    
    int
    main(void)
    {
    #define I 5
    #define N 11
    #define A 8
    
      int a = A;
      int s = I;
    
    #pragma acc parallel vector_length(N)
      {
        int i;
    #pragma acc loop reduction(+:s)
        for (i = 0; i < N; ++i)
          s += a;
      }
    
      if (s != I + N * A)
        abort ();
    
      return 0;
    }

OK to check in the following?

--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -9547,8 +9547,7 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
      let var = the original reduction variable
      let array = reduction variable array
 
-     var = array[0]
-     for (i = 1; i < nthreads; i++)
+     for (i = 0; i < nthreads; i++)
        var op= array[i]
  */
 
@@ -9556,42 +9555,9 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
   loop_body = create_artificial_label (UNKNOWN_LOCATION);
   loop_exit = create_artificial_label (UNKNOWN_LOCATION);
 
-  /* Initialize the reduction variables to be value of the first array
-     element.  */
-  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-    {
-      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
-	continue;
-
-      tree_code reduction_code = OMP_CLAUSE_REDUCTION_CODE (c);
-
-      /* reduction(-:var) sums up the partial results, so it acts
-	 identically to reduction(+:var).  */
-      if (reduction_code == MINUS_EXPR)
-        reduction_code = PLUS_EXPR;
-
-      /* Set up reduction variable, var.  Becuase it's not gimple register,
-         it needs to be treated as a reference.  */
-      var = OMP_CLAUSE_DECL (c);
-      type = get_base_type (var);
-      tree ptr = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
-
-      /* Extract array[0] into mem.  */
-      tree mem = create_tmp_var (type, NULL);
-      gimplify_assign (mem, build_simple_mem_ref (ptr), stmt_seqp);
-
-      /* Find the original reduction variable.  */
-      tree x = build_outer_var_ref (var, ctx);
-      if (is_reference (var))
-	var = build_simple_mem_ref (var);
-
-      x = lang_hooks.decls.omp_clause_assign_op (c, var, mem);
-      gimplify_and_add (unshare_expr(x), stmt_seqp);
-    }
-
-  /* Create an index variable and set it to one.  */
+  /* Create and initialize an index variable.  */
   tree ix = create_tmp_var (sizetype, NULL);
-  gimplify_assign (ix, fold_build1 (NOP_EXPR, sizetype, integer_one_node),
+  gimplify_assign (ix, fold_build1 (NOP_EXPR, sizetype, integer_zero_node),
 		   stmt_seqp);
 
   /* Insert the loop header label here.  */


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-28 17:06 ` Thomas Schwinge
@ 2014-07-28 17:54   ` Cesar Philippidis
  2014-10-23  8:26     ` Thomas Schwinge
  0 siblings, 1 reply; 12+ messages in thread
From: Cesar Philippidis @ 2014-07-28 17:54 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

On 07/28/2014 10:02 AM, Thomas Schwinge wrote:
> Hi Cesar!
> 
> On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
>> This patch is the first step to enabling parallel reductions in openacc.
> 
> I think I have found one issue in this code -- but please verify that my
> understanding of reductions is correct.  Namely:
> 
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> +/* Helper function to finalize local data for the reduction arrays. The
>> +   reduction array needs to be reduced to the original reduction variable.
>> +   FIXME: This function assumes that there are vector_length threads in
>> +   total.  Also, it assumes that there are at least vector_length iterations
>> +   in the for loop.  */
>> +
>> +static void
>> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
>> +			 omp_context *ctx)
>> +{
>> +  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
>> +
>> +  tree c, var, array, loop_header, loop_body, loop_exit;
>> +  gimple stmt;
>> +
>> +  /* Create for loop.
>> +
>> +     let var = the original reduction variable
>> +     let array = reduction variable array
>> +
>> +     var = array[0]
>> +     for (i = 1; i < nthreads; i++)
>> +       var op= array[i]
>> + */
> 
> This should also consider the reduction variable's original value.  Test
> case (which does the expected thing if modified for OpenMP):
> 
>     #include <stdlib.h>
>     
>     int
>     main(void)
>     {
>     #define I 5
>     #define N 11
>     #define A 8
>     
>       int a = A;
>       int s = I;
>     
>     #pragma acc parallel vector_length(N)
>       {
>         int i;
>     #pragma acc loop reduction(+:s)
>         for (i = 0; i < N; ++i)
>           s += a;
>       }
>     
>       if (s != I + N * A)
>         abort ();
>     
>       return 0;
>     }
> 
> OK to check in the following?

Reductions can be specified with both the parallel and loop constructs.
According to section 2.5.11 in the opacc spec, a reduction in a parallel
construct should behave as you described:

	At the end of the region, the values for each gang are combined
	using the reduction operator, and the result combined with the
	value of the original variable and stored in the original
	variable.

However,in section 2.7.11, a reduction in a loop construct behaves as
follows:

	At the end of the loop, the values for each thread are combined
	using the specified reduction operator, and the result stored
	in the original variable at the end of the parallel or kernels 	
	region.

The parallel reduction behavior does make more sense though. I'll ask
the openacc gurus if there's a typo in section 2.7.11. It does refer to
parallel reduction.

Thanks,
Cesar

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-08 14:28   ` Cesar Philippidis
  2014-07-08 17:02     ` Cesar Philippidis
@ 2014-09-25 20:06     ` Thomas Schwinge
  2014-11-11 15:03     ` Thomas Schwinge
  2014-11-12 15:59     ` Thomas Schwinge
  3 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-09-25 20:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: Cesar Philippidis

[-- Attachment #1: Type: text/plain, Size: 9000 bytes --]

Hi!

On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> On 07/07/2014 02:55 AM, Thomas Schwinge wrote:
> > On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> >> This patch is the first step to enabling parallel reductions in openacc.

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c

> +#pragma acc parallel vector_length (vl)
> +#pragma acc loop reduction (+:result)
> +  for (i = 0; i < n; i++)
> +    result += array[i];
> +#pragma acc end parallel

> [...]

Committed to gomp-4_0-branch in r215617:

commit a6b46623f7543f07c9b2ebcd080d3216c6b30d69
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Sep 25 20:04:16 2014 +0000

    Remove erroneous "#pragma acc end parallel"s.
    
    	gcc/testsuite/
    	* c-c++-common/goacc/reduction-1.c: Remove erroneous "#pragma acc
    	end parallel"s.
    	* c-c++-common/goacc/reduction-2.c: Likewise.
    	* c-c++-common/goacc/reduction-3.c: Likewise.
    	* c-c++-common/goacc/reduction-4.c: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@215617 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp                   | 8 ++++++++
 gcc/testsuite/c-c++-common/goacc/reduction-1.c | 9 ---------
 gcc/testsuite/c-c++-common/goacc/reduction-2.c | 6 ------
 gcc/testsuite/c-c++-common/goacc/reduction-3.c | 6 ------
 gcc/testsuite/c-c++-common/goacc/reduction-4.c | 6 ------
 5 files changed, 8 insertions(+), 27 deletions(-)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index f350c35..786f434 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2014-09-25  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c-c++-common/goacc/reduction-1.c: Remove erroneous "#pragma acc
+	end parallel"s.
+	* c-c++-common/goacc/reduction-2.c: Likewise.
+	* c-c++-common/goacc/reduction-3.c: Likewise.
+	* c-c++-common/goacc/reduction-4.c: Likewise.
+
 2014-09-23  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* c-c++-common/goacc/acc_on_device-1.c: New file.
diff --git gcc/testsuite/c-c++-common/goacc/reduction-1.c gcc/testsuite/c-c++-common/goacc/reduction-1.c
index cff7d2d..0f50082 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -15,14 +15,12 @@ main(void)
 #pragma acc loop reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
-#pragma acc end parallel
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
-#pragma acc end parallel
 
 //   result = 0;
 //   vresult = 0;
@@ -32,49 +30,42 @@ main(void)
 // #pragma acc loop reduction (+:result)
 //   for (i = 0; i < n; i++)
 //       result = result > array[i] ? result : array[i];
-// #pragma acc end parallel
 //
 //   /* 'min' reductions.  */
 // #pragma acc parallel vector_length (vl)
 // #pragma acc loop reduction (+:result)
 //   for (i = 0; i < n; i++)
 //       result = result < array[i] ? result : array[i];
-// #pragma acc end parallel
 
   /* '&' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (&:result)
   for (i = 0; i < n; i++)
     result &= array[i];
-#pragma acc end parallel
 
   /* '|' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (|:result)
   for (i = 0; i < n; i++)
     result |= array[i];
-#pragma acc end parallel
 
   /* '^' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (^:result)
   for (i = 0; i < n; i++)
     result ^= array[i];
-#pragma acc end parallel
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
-#pragma acc end parallel
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
-#pragma acc end parallel
 
   return 0;
 }
diff --git gcc/testsuite/c-c++-common/goacc/reduction-2.c gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 9686b37..1f95138 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -15,42 +15,36 @@ main(void)
 #pragma acc loop reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
-#pragma acc end parallel
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
-#pragma acc end parallel
 
 //   /* 'max' reductions.  */
 // #pragma acc parallel vector_length (vl)
 // #pragma acc loop reduction (+:result)
 //   for (i = 0; i < n; i++)
 //       result = result > array[i] ? result : array[i];
-// #pragma acc end parallel
 // 
 //   /* 'min' reductions.  */
 // #pragma acc parallel vector_length (vl)
 // #pragma acc loop reduction (+:result)
 //   for (i = 0; i < n; i++)
 //       result = result < array[i] ? result : array[i];
-// #pragma acc end parallel
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
-#pragma acc end parallel
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
-#pragma acc end parallel
 
   return 0;
 }
diff --git gcc/testsuite/c-c++-common/goacc/reduction-3.c gcc/testsuite/c-c++-common/goacc/reduction-3.c
index c618c4e..476e375 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -15,42 +15,36 @@ main(void)
 #pragma acc loop reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
-#pragma acc end parallel
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
-#pragma acc end parallel
 
 //   /* 'max' reductions.  */
 // #pragma acc parallel vector_length (vl)
 // #pragma acc loop reduction (+:result)
 //   for (i = 0; i < n; i++)
 //       result = result > array[i] ? result : array[i];
-// #pragma acc end parallel
 // 
 //   /* 'min' reductions.  */
 // #pragma acc parallel vector_length (vl)
 // #pragma acc loop reduction (+:result)
 //   for (i = 0; i < n; i++)
 //       result = result < array[i] ? result : array[i];
-// #pragma acc end parallel
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
-#pragma acc end parallel
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
-#pragma acc end parallel
 
   return 0;
 }
diff --git gcc/testsuite/c-c++-common/goacc/reduction-4.c gcc/testsuite/c-c++-common/goacc/reduction-4.c
index 1e032a1..73dde86 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -15,7 +15,6 @@ main(void)
 #pragma acc loop reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
-#pragma acc end parallel
 
   /* Needs support for complex multiplication.  */
 
@@ -24,35 +23,30 @@ main(void)
 // #pragma acc loop reduction (*:result)
 //   for (i = 0; i < n; i++)
 //     result *= array[i];
-// #pragma acc end parallel
 //
 //   /* 'max' reductions.  */
 // #pragma acc parallel vector_length (vl)
 // #pragma acc loop reduction (+:result)
 //   for (i = 0; i < n; i++)
 //       result = result > array[i] ? result : array[i];
-// #pragma acc end parallel
 // 
 //   /* 'min' reductions.  */
 // #pragma acc parallel vector_length (vl)
 // #pragma acc loop reduction (+:result)
 //   for (i = 0; i < n; i++)
 //       result = result < array[i] ? result : array[i];
-// #pragma acc end parallel
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (__real__(result) > __real__(array[i]));
-#pragma acc end parallel
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
 #pragma acc loop reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (__real__(result) > __real__(array[i]));
-#pragma acc end parallel
 
   return 0;
 }


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-28 17:54   ` Cesar Philippidis
@ 2014-10-23  8:26     ` Thomas Schwinge
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-10-23  8:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: Cesar Philippidis, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 7936 bytes --]

Hi!

On Mon, 28 Jul 2014 10:36:03 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> On 07/28/2014 10:02 AM, Thomas Schwinge wrote:
> > On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> >> This patch is the first step to enabling parallel reductions in openacc.
> > 
> > I think I have found one issue in this code -- but please verify that my
> > understanding of reductions is correct.  Namely:
> > 
> >> --- a/gcc/omp-low.c
> >> +++ b/gcc/omp-low.c
> >> +/* Helper function to finalize local data for the reduction arrays. The
> >> +   reduction array needs to be reduced to the original reduction variable.
> >> +   FIXME: This function assumes that there are vector_length threads in
> >> +   total.  Also, it assumes that there are at least vector_length iterations
> >> +   in the for loop.  */
> >> +
> >> +static void
> >> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
> >> +			 omp_context *ctx)
> >> +{
> >> +  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
> >> +
> >> +  tree c, var, array, loop_header, loop_body, loop_exit;
> >> +  gimple stmt;
> >> +
> >> +  /* Create for loop.
> >> +
> >> +     let var = the original reduction variable
> >> +     let array = reduction variable array
> >> +
> >> +     var = array[0]
> >> +     for (i = 1; i < nthreads; i++)
> >> +       var op= array[i]
> >> + */
> > 
> > This should also consider the reduction variable's original value.  Test
> > case (which does the expected thing if modified for OpenMP):
> > 
> >     #include <stdlib.h>
> >     
> >     int
> >     main(void)
> >     {
> >     #define I 5
> >     #define N 11
> >     #define A 8
> >     
> >       int a = A;
> >       int s = I;
> >     
> >     #pragma acc parallel vector_length(N)
> >       {
> >         int i;
> >     #pragma acc loop reduction(+:s)
> >         for (i = 0; i < N; ++i)
> >           s += a;
> >       }
> >     
> >       if (s != I + N * A)
> >         abort ();
> >     
> >       return 0;
> >     }
> > 
> > OK to check in the following?
> 
> Reductions can be specified with both the parallel and loop constructs.
> According to section 2.5.11 in the opacc spec, a reduction in a parallel
> construct should behave as you described:
> 
> 	At the end of the region, the values for each gang are combined
> 	using the reduction operator, and the result combined with the
> 	value of the original variable and stored in the original
> 	variable.
> 
> However,in section 2.7.11, a reduction in a loop construct behaves as
> follows:
> 
> 	At the end of the loop, the values for each thread are combined
> 	using the specified reduction operator, and the result stored
> 	in the original variable at the end of the parallel or kernels 	
> 	region.
> 
> The parallel reduction behavior does make more sense though. I'll ask
> the openacc gurus if there's a typo in section 2.7.11. It does refer to
> parallel reduction.

I proceeded by checking in the following patch to gomp-4_0-branch,
r216574:

commit 75e2a58b8ef7d20be2239ff029493986542ee7e3
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Oct 23 07:26:40 2014 +0000

    OpenACC reductions: Don't skip the reduction variable's original value.
    
    	gcc/
    	* omp-low.c (finalize_reduction_data): Don't skip the reduction
    	variable's original value.
    	libgomp/
    	* testsuite/libgomp.oacc-c/reduction-initial-1.c: New file.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@216574 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                 |  5 +++
 gcc/omp-low.c                                      | 40 ++--------------------
 libgomp/ChangeLog.gomp                             |  4 +++
 .../testsuite/libgomp.oacc-c/reduction-initial-1.c | 32 +++++++++++++++++
 4 files changed, 44 insertions(+), 37 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 6d107d2..28e7252 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-10-23  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* omp-low.c (finalize_reduction_data): Don't skip the reduction
+	variable's original value.
+
 2014-10-20  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* gimplify.c (gimplify_scan_omp_clauses): Remove switch stmt which
diff --git gcc/omp-low.c gcc/omp-low.c
index b8022c2..b21235f 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -9869,8 +9869,7 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
      let var = the original reduction variable
      let array = reduction variable array
 
-     var = array[0]
-     for (i = 1; i < nthreads; i++)
+     for (i = 0; i < nthreads; i++)
        var op= array[i]
  */
 
@@ -9878,42 +9877,9 @@ finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
   loop_body = create_artificial_label (UNKNOWN_LOCATION);
   loop_exit = create_artificial_label (UNKNOWN_LOCATION);
 
-  /* Initialize the reduction variables to be value of the first array
-     element.  */
-  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-    {
-      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
-	continue;
-
-      tree_code reduction_code = OMP_CLAUSE_REDUCTION_CODE (c);
-
-      /* reduction(-:var) sums up the partial results, so it acts
-	 identically to reduction(+:var).  */
-      if (reduction_code == MINUS_EXPR)
-        reduction_code = PLUS_EXPR;
-
-      /* Set up reduction variable, var.  Becuase it's not gimple register,
-         it needs to be treated as a reference.  */
-      var = OMP_CLAUSE_DECL (c);
-      type = get_base_type (var);
-      tree ptr = lookup_reduction (omp_get_id (OMP_CLAUSE_DECL (c)), ctx);
-
-      /* Extract array[0] into mem.  */
-      tree mem = create_tmp_var (type, NULL);
-      gimplify_assign (mem, build_simple_mem_ref (ptr), stmt_seqp);
-
-      /* Find the original reduction variable.  */
-      tree x = build_outer_var_ref (var, ctx);
-      if (is_reference (var))
-	var = build_simple_mem_ref (var);
-
-      x = lang_hooks.decls.omp_clause_assign_op (c, var, mem);
-      gimplify_and_add (unshare_expr(x), stmt_seqp);
-    }
-
-  /* Create an index variable and set it to one.  */
+  /* Create and initialize an index variable.  */
   tree ix = create_tmp_var (sizetype, NULL);
-  gimplify_assign (ix, fold_build1 (NOP_EXPR, sizetype, integer_one_node),
+  gimplify_assign (ix, fold_build1 (NOP_EXPR, sizetype, integer_zero_node),
 		   stmt_seqp);
 
   /* Insert the loop header label here.  */
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 065dfb6..5363068 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2014-10-23  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* testsuite/libgomp.oacc-c/reduction-initial-1.c: New file.
+
 2014-10-20  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* (GOACC_update): Declare.
diff --git libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c
new file mode 100644
index 0000000..e763cf2
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* TODO:
+   { dg-xfail-run-if "" { *-*-* } { "-DACC_DEVICE_TYPE_host=1" } { "" } } */
+
+int
+main(void)
+{
+#define I 5
+/* TODO */
+#ifdef ACC_DEVICE_TYPE_host_nonshm
+# define N 1
+#else
+# define N 11
+#endif
+#define A 8
+
+  int a = A;
+  int s = I;
+
+#pragma acc parallel vector_length(N)
+  {
+    int i;
+#pragma acc loop reduction(+:s)
+    for (i = 0; i < N; ++i)
+      s += a;
+  }
+
+  if (s != I + N * A)
+    __builtin_abort();
+
+  return 0;
+}


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-08 14:28   ` Cesar Philippidis
  2014-07-08 17:02     ` Cesar Philippidis
  2014-09-25 20:06     ` Thomas Schwinge
@ 2014-11-11 15:03     ` Thomas Schwinge
  2014-11-11 15:15       ` Thomas Schwinge
  2014-11-12 15:59     ` Thomas Schwinge
  3 siblings, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2014-11-11 15:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Cesar Philippidis

[-- Attachment #1: Type: text/plain, Size: 5609 bytes --]

Hi!

On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> On 07/07/2014 02:55 AM, Thomas Schwinge wrote:
> 
> > On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> >> This patch is the first step to enabling parallel reductions in openacc.

> I've committed this updated version
> of the patch.

In r217354, I just applied the following cleanup to gomp-4_0-branch:

commit 4fe8b3620b258ac904d9eade5f76dede69a80c98
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Nov 11 14:52:26 2014 +0000

    OpenACC reductions maintenance.
    
    	gcc/
    	* omp-low.c (maybe_lookup_reduction): Don't require an OpenACC
    	context.
    	(lower_oacc_offload): Simplify use of maybe_lookup_reduction.
    
    	gcc/
    	* omp-low.c (delete_omp_context): Dispose of reduction_map.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217354 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  6 ++++++
 gcc/omp-low.c      | 56 +++++++++++++++++++++++++++++-------------------------
 2 files changed, 36 insertions(+), 26 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index dacfad8..94a7f8c 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,11 @@
 2014-11-11  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* omp-low.c (delete_omp_context): Dispose of reduction_map.
+
+	* omp-low.c (maybe_lookup_reduction): Don't require an OpenACC
+	context.
+	(lower_oacc_offload): Simplify use of maybe_lookup_reduction.
+
 	* omp-low.c (lower_omp_target): Restore two gcc_asserts.
 
 2014-11-06  Thomas Schwinge  <thomas@codesourcery.com>
diff --git gcc/omp-low.c gcc/omp-low.c
index c63ec4e..5695ec3 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -938,7 +938,7 @@ get_base_type (tree decl)
   return type;
 }
 
-/* Lookup variables in the decl or field splay trees.  The "maybe" form
+/* Lookup variables.  The "maybe" form
    allows for the variable form to not have been entered, otherwise we
    assert that the variable must have been entered.  */
 
@@ -975,17 +975,6 @@ lookup_sfield (tree var, omp_context *ctx)
 }
 
 static inline tree
-lookup_reduction (const char *id, omp_context *ctx)
-{
-  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
-
-  splay_tree_node n;
-  n = splay_tree_lookup (ctx->reduction_map,
-			 (splay_tree_key) id);
-  return (tree) n->value;
-}
-
-static inline tree
 maybe_lookup_field (tree var, omp_context *ctx)
 {
   splay_tree_node n;
@@ -994,14 +983,22 @@ maybe_lookup_field (tree var, omp_context *ctx)
 }
 
 static inline tree
+lookup_reduction (const char *id, omp_context *ctx)
+{
+  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
+
+  splay_tree_node n;
+  n = splay_tree_lookup (ctx->reduction_map, (splay_tree_key) id);
+  return (tree) n->value;
+}
+
+static inline tree
 maybe_lookup_reduction (tree var, omp_context *ctx)
 {
-  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
-
-  splay_tree_node n;
-  n = splay_tree_lookup (ctx->reduction_map,
-			 (splay_tree_key) var);
-  return n ?(tree) n->value : NULL_TREE;
+  splay_tree_node n = NULL;
+  if (ctx->reduction_map)
+    n = splay_tree_lookup (ctx->reduction_map, (splay_tree_key) var);
+  return n ? (tree) n->value : NULL_TREE;
 }
 
 /* Return true if DECL should be copied by pointer.  SHARED_CTX is
@@ -1574,6 +1571,11 @@ delete_omp_context (splay_tree_value value)
     splay_tree_delete (ctx->field_map);
   if (ctx->sfield_map)
     splay_tree_delete (ctx->sfield_map);
+  if (ctx->reduction_map
+      /* Shared over several omp_contexts.  */
+      && (ctx->outer == NULL
+	  || ctx->reduction_map != ctx->outer->reduction_map))
+    splay_tree_delete (ctx->reduction_map);
 
   /* We hijacked DECL_ABSTRACT_ORIGIN earlier.  We need to clear it before
      it produces corrupt debug information.  */
@@ -10481,10 +10483,14 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			    || (OMP_CLAUSE_MAP_KIND (c)
 				!= OMP_CLAUSE_MAP_FORCE_DEVICEPTR)
 			    || TREE_CODE (TREE_TYPE (ovar)) != ARRAY_TYPE);
-		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-		    && OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER
-		    && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
-		    && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
+		if (maybe_lookup_reduction (var, ctx))
+		  {
+		    gimplify_assign (x, var, &ilist);
+		  }
+		else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+			 && OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER
+			 && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
+			 && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
 		  {
 		    tree avar
 		      = create_tmp_var (TREE_TYPE (TREE_TYPE (x)), NULL);
@@ -10494,8 +10500,7 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		    avar = build_fold_addr_expr (avar);
 		    gimplify_assign (x, avar, &ilist);
 		  }
-		else if (is_gimple_reg (var)
-			 && !maybe_lookup_reduction (var, ctx))
+		else if (is_gimple_reg (var))
 		  {
 		    tree avar = create_tmp_var (TREE_TYPE (var), NULL);
 		    mark_addressable (avar);
@@ -10521,8 +10526,7 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		  }
 		else
 		  {
-		    if (!maybe_lookup_reduction (var, ctx))
-		      var = build_fold_addr_expr (var);
+		    var = build_fold_addr_expr (var);
 		    gimplify_assign (x, var, &ilist);
 		  }
 	      }


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-11-11 15:03     ` Thomas Schwinge
@ 2014-11-11 15:15       ` Thomas Schwinge
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-11-11 15:15 UTC (permalink / raw)
  To: gcc-patches, Cesar Philippidis

[-- Attachment #1: Type: text/plain, Size: 5477 bytes --]

Hi!

On Tue, 11 Nov 2014 16:03:05 +0100, I wrote:
> On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> > On 07/07/2014 02:55 AM, Thomas Schwinge wrote:
> > 
> > > On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> > >> This patch is the first step to enabling parallel reductions in openacc.
> 
> > I've committed this updated version
> > of the patch.
> 
> In r217354, I just applied the following cleanup to gomp-4_0-branch:
> 
> commit 4fe8b3620b258ac904d9eade5f76dede69a80c98
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Nov 11 14:52:26 2014 +0000
> 
>     OpenACC reductions maintenance.
>     
>     	gcc/
>     	* omp-low.c (maybe_lookup_reduction): Don't require an OpenACC
>     	context.
>     	(lower_oacc_offload): Simplify use of maybe_lookup_reduction.
>     
>     	gcc/
>     	* omp-low.c (delete_omp_context): Dispose of reduction_map.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217354 138bc75d-0d04-0410-961f-82ee72b054a4

I further tried to tidy this up as follows -- but that is causing the
reduction execution tests to fail; indeed -fdump-tree-all already shows
unexpected changes during gimplification.  (I first suspected that
variables are added to a "GIMPLE_OMP_FOR" reduction_map, and then not
found when reading from "GIMPLE_OACC_PARALLEL" one, but now I'm not at
all sure about this theory.)  Cesar, is cleanup like that useful at all,
and if yes, could you look into that, later on?  (Definitely not urgent.)

commit 3ef04b65c1b5d3db5aa4b903a1ec0f693bb75ca8
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Tue Nov 11 13:04:00 2014 +0100

    [WIP] Make reduction_map per context.
---
 gcc/omp-low.c | 41 +++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 5695ec3..44ed9a0 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -987,8 +987,19 @@ lookup_reduction (const char *id, omp_context *ctx)
 {
   gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
 
-  splay_tree_node n;
-  n = splay_tree_lookup (ctx->reduction_map, (splay_tree_key) id);
+  splay_tree_node n = NULL;
+  do
+    {
+      if (ctx->reduction_map != NULL)
+	n = splay_tree_lookup (ctx->reduction_map, (splay_tree_key) id);
+      if (n != NULL)
+	break;
+      /* If not found, recurse into outer context.  */
+      ctx = ctx->outer;
+    }
+  while (ctx != NULL
+	 /* && ctx->reduction_map != NULL */);
+  gcc_assert (n != NULL);
   return (tree) n->value;
 }
 
@@ -996,8 +1007,17 @@ static inline tree
 maybe_lookup_reduction (tree var, omp_context *ctx)
 {
   splay_tree_node n = NULL;
-  if (ctx->reduction_map)
-    n = splay_tree_lookup (ctx->reduction_map, (splay_tree_key) var);
+  do
+    {
+      if (ctx->reduction_map != NULL)
+	n = splay_tree_lookup (ctx->reduction_map, (splay_tree_key) var);
+      if (n != NULL)
+	break;
+      /* If not found, recurse into outer context.  */
+      ctx = ctx->outer;
+    }
+  while (ctx != NULL
+	 /* && ctx->reduction_map != NULL */);
   return n ? (tree) n->value : NULL_TREE;
 }
 
@@ -1498,8 +1518,6 @@ new_omp_context (gimple stmt, omp_context *outer_ctx)
       ctx->cb = outer_ctx->cb;
       ctx->cb.block = NULL;
       ctx->depth = outer_ctx->depth + 1;
-      /* FIXME: handle reductions recursively.  */
-      ctx->reduction_map = outer_ctx->reduction_map;
     }
   else
     {
@@ -1513,7 +1531,6 @@ new_omp_context (gimple stmt, omp_context *outer_ctx)
       ctx->cb.eh_lp_nr = 0;
       ctx->cb.transform_call_graph_edges = CB_CGE_MOVE;
       ctx->depth = 1;
-      //TODO ctx->reduction_map = TODO;
     }
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
@@ -1571,10 +1588,7 @@ delete_omp_context (splay_tree_value value)
     splay_tree_delete (ctx->field_map);
   if (ctx->sfield_map)
     splay_tree_delete (ctx->sfield_map);
-  if (ctx->reduction_map
-      /* Shared over several omp_contexts.  */
-      && (ctx->outer == NULL
-	  || ctx->reduction_map != ctx->outer->reduction_map))
+  if (ctx->reduction_map)
     splay_tree_delete (ctx->reduction_map);
 
   /* We hijacked DECL_ABSTRACT_ORIGIN earlier.  We need to clear it before
@@ -1765,6 +1779,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	      install_var_local (array, c);
 
 	      /* Insert it into the current context.  */
+	      //TODO
 	      splay_tree_insert (ctx->reduction_map,
 				 (splay_tree_key) omp_get_id(var),
 				 (splay_tree_value) array);
@@ -2394,8 +2409,8 @@ scan_oacc_offload (gimple stmt, omp_context *outer_ctx)
   DECL_ARTIFICIAL (name) = 1;
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx->record_type) = name;
-  create_omp_child_function (ctx, false);
   ctx->reduction_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
+  create_omp_child_function (ctx, false);
 
   gimple_omp_set_child_fn (stmt, ctx->cb.dst_fn);
 
@@ -2658,6 +2673,8 @@ scan_omp_for (gimple stmt, omp_context *outer_ctx)
   size_t i;
 
   ctx = new_omp_context (stmt, outer_ctx);
+  if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
+    ctx->reduction_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
 
   scan_sharing_clauses (gimple_omp_for_clauses (stmt), ctx);
 


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-07-08 14:28   ` Cesar Philippidis
                       ` (2 preceding siblings ...)
  2014-11-11 15:03     ` Thomas Schwinge
@ 2014-11-12 15:59     ` Thomas Schwinge
  2014-11-13  8:24       ` Thomas Schwinge
  3 siblings, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2014-11-12 15:59 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2260 bytes --]

Hi Casar!

Can you please help me trying to understand the following change, that
you've done a long time ago:

On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> On 07/07/2014 02:55 AM, Thomas Schwinge wrote:
> 
> > On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> >> This patch is the first step to enabling parallel reductions in openacc.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> @@ -1750,7 +1808,16 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  		      && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
>  		    install_var_field (decl, true, 7, ctx);
>  		  else
> -		    install_var_field (decl, true, 3, ctx);
> +		    {
> +		      if (!is_gimple_omp_oacc_specifically (ctx->stmt))
> +			install_var_field (decl, true, 3, ctx);
> +		      else
> +		    {
> +		      /* decl goes heres.  */
> +		      omp_context *c = (ctx->field_map ? ctx : ctx->outer);
> +		      install_var_field (decl, true, 3, c);
> +		    }
> +		    }
>  		  if (is_gimple_omp_offloaded (ctx->stmt))
>  		    install_var_local (decl, ctx);
>  		}

Writing your change differently, easier to read:

                      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
                          && OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER
                          && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
                          && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
                        install_var_field (decl, true, 7, ctx);
    +                 else if (is_gimple_omp_oacc_specifically (ctx->stmt))
    +                   {
    +                     /* decl goes heres.  */
    +                     omp_context *c = (ctx->field_map ? ctx : ctx->outer);
    +                     install_var_field (decl, true, 3, c);
    +                   }
                      else
                        install_var_field (decl, true, 3, ctx);

It is a generic OpenACC (that is, not only OpenACC reductions) code path
that you're changing here.  Can you still come up with a rationale for
that change, or should this possibly be restricted to OpenACC reductions
processing only?


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
  2014-11-12 15:59     ` Thomas Schwinge
@ 2014-11-13  8:24       ` Thomas Schwinge
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2014-11-13  8:24 UTC (permalink / raw)
  To: gcc-patches; +Cc: Cesar Philippidis

[-- Attachment #1: Type: text/plain, Size: 4451 bytes --]

Hi!

On Wed, 12 Nov 2014 16:58:54 +0100, I wrote:
> Hi Casar!
> 
> Can you please help me trying to understand the following change, that
> you've done a long time ago:
> 
> On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> > On 07/07/2014 02:55 AM, Thomas Schwinge wrote:
> > 
> > > On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis <cesar_philippidis@mentor.com> wrote:
> > >> This patch is the first step to enabling parallel reductions in openacc.
> 
> > --- a/gcc/omp-low.c
> > +++ b/gcc/omp-low.c
> 
> > @@ -1750,7 +1808,16 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >  		      && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
> >  		    install_var_field (decl, true, 7, ctx);
> >  		  else
> > -		    install_var_field (decl, true, 3, ctx);
> > +		    {
> > +		      if (!is_gimple_omp_oacc_specifically (ctx->stmt))
> > +			install_var_field (decl, true, 3, ctx);
> > +		      else
> > +		    {
> > +		      /* decl goes heres.  */
> > +		      omp_context *c = (ctx->field_map ? ctx : ctx->outer);
> > +		      install_var_field (decl, true, 3, c);
> > +		    }
> > +		    }
> >  		  if (is_gimple_omp_offloaded (ctx->stmt))
> >  		    install_var_local (decl, ctx);
> >  		}
> 
> Writing your change differently, easier to read:
> 
>                       if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>                           && OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER
>                           && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
>                           && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
>                         install_var_field (decl, true, 7, ctx);
>     +                 else if (is_gimple_omp_oacc_specifically (ctx->stmt))
>     +                   {
>     +                     /* decl goes heres.  */
>     +                     omp_context *c = (ctx->field_map ? ctx : ctx->outer);
>     +                     install_var_field (decl, true, 3, c);
>     +                   }
>                       else
>                         install_var_field (decl, true, 3, ctx);
> 
> It is a generic OpenACC (that is, not only OpenACC reductions) code path
> that you're changing here.  Can you still come up with a rationale for
> that change, or should this possibly be restricted to OpenACC reductions
> processing only?

Cesar could neither remember, nor make up a new rationale on the spot;
trusting in regression testing, and that showing no issues, I have now
reverted this in r217462 on gomp-4_0-branch:

commit 389a8054007e619a917c16234cc697cae711de02
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Nov 13 08:13:33 2014 +0000

    Middle end: Revert earlier change.
    
    This change in OMP_CLAUSE_MAP handling had originally been applied for
    reductions support, but is now obsolete.
    
    	gcc/
    	* omp-low.c (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Revert
    	earlier change.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217462 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  5 +++++
 gcc/omp-low.c      | 11 +----------
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 7501679..174235d 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-11-13  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* omp-low.c (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Revert
+	earlier change.
+
 2014-11-12  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* tree.c (omp_clause_code_name): Add missing comma
diff --git gcc/omp-low.c gcc/omp-low.c
index 44e14b4..e511846 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1902,16 +1902,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		      && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 		    install_var_field (decl, true, 7, ctx);
 		  else
-		    {
-		      if (!is_gimple_omp_oacc_specifically (ctx->stmt))
-			install_var_field (decl, true, 3, ctx);
-		      else
-		    {
-		      /* decl goes heres.  */
-		      omp_context *c = (ctx->field_map ? ctx : ctx->outer);
-		      install_var_field (decl, true, 3, c);
-		    }
-		    }
+		    install_var_field (decl, true, 3, ctx);
 		  if (is_gimple_omp_offloaded (ctx->stmt))
 		    install_var_local (decl, ctx);
 		}


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-11-13  8:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-06 23:11 [patch,gomp-4_0-branch] openacc parallel reduction part 1 Cesar Philippidis
2014-07-07  9:55 ` Thomas Schwinge
2014-07-08 14:28   ` Cesar Philippidis
2014-07-08 17:02     ` Cesar Philippidis
2014-09-25 20:06     ` Thomas Schwinge
2014-11-11 15:03     ` Thomas Schwinge
2014-11-11 15:15       ` Thomas Schwinge
2014-11-12 15:59     ` Thomas Schwinge
2014-11-13  8:24       ` Thomas Schwinge
2014-07-28 17:06 ` Thomas Schwinge
2014-07-28 17:54   ` Cesar Philippidis
2014-10-23  8:26     ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).