public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch] various OpenACC reduction enhancements
@ 2018-06-29 18:20 Cesar Philippidis
  2018-06-29 18:22 ` [patch] various OpenACC reduction enhancements - ME and nvptx changes Cesar Philippidis
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Cesar Philippidis @ 2018-06-29 18:20 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Tom de Vries

The following patch set includes various OpenACC reduction enhancements
present in og8. These include the following individual og8 commits:

  * (4469fc4) [Fortran] Permit reductions in gfc_omp_clause_copy_ctor
  * (704f1a2) [nxptx, OpenACC] vector reductions
  * (8a35c89) [OpenACC] Fix a reduction bug involving
              GOMP_MAP_FIRSTPRIVATE_POINTER variables
  * (16ead33) [OpenACC] Update error messages for c and c++ reductions
  * (65dd9cf) Make OpenACC orphan gang reductions errors
  * (5d60102) [PR80547] Handle parallel reductions explicitly
              initialized by the user

The nvptx vector reduction enhancement is a prerequisite for the
forthcoming variable-length patches.

This patch as a whole is somewhat large, so I've split it into three
pieces, 1) ME and nvptx changes, FE changes, and test cases. I'll reply
to this message with the individual patches.

Thanks,
Cesar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes
  2018-06-29 18:20 [patch] various OpenACC reduction enhancements Cesar Philippidis
@ 2018-06-29 18:22 ` Cesar Philippidis
  2018-10-05 14:09   ` Tom de Vries
  2018-12-04 12:29   ` Jakub Jelinek
  2018-06-29 18:23 ` [patch] various OpenACC reduction enhancements - FE changes Cesar Philippidis
  2018-06-29 18:38 ` [patch] various OpenACC reduction enhancements - test cases Cesar Philippidis
  2 siblings, 2 replies; 14+ messages in thread
From: Cesar Philippidis @ 2018-06-29 18:22 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 196 bytes --]

The attached patch includes the nvptx and GCC ME reductions enhancements.

Is this patch OK for trunk? It bootstrapped / regression tested cleanly
for x86_64 with nvptx offloading.

Thanks,
Cesar

[-- Attachment #2: trunk-reductions-gcc.diff --]
[-- Type: text/x-patch, Size: 16524 bytes --]

2018-06-29  Cesar Philippidis  <cesar@codesourcery.com>
	    Nathan Sidwell  <nathan@acm.org>

	gcc/
	* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
	(nvptx_split_blocks): Call it for cond_uni insn.
	(nvptx_expand_cond_uni): New.
	(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
	(nvptx_init_builtins): Initialize it.
	(nvptx_expand_builtin):
	(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
	tree BITS operand.
	(nvptx_vector_reduction): New.
	(nvptx_adjust_reduction_type): New.
	(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
	(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
	Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_teardown):
	* config/nvptx/nvptx.md (cond_uni): New pattern.
	* omp-general.h (enum oacc_loop_flags): Add OLF_REDUCTION enum.
	* omp-low.c (lower_oacc_reductions): Handle reduction decls mapped
	with GOMP_MAP_FIRSTPRIVATE_POINTER.
	(lower_oacc_head_mark): Use OLF_REDUCTION to mark OpenACC reductions.
	* omp-offload.c (oacc_loop_auto_partitions): Don't assign gang
	level parallelism to orphan reductions.
	(default_goacc_reduction): Retype ref_to_res as necessary.

---
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 5608bee8a8d..33ec3db1153 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2863,6 +2863,52 @@ nvptx_reorg_uniform_simt ()
     }
 }
 
+/* UNIFIED is a cond_uni insn.  Find the branch insn it affects, and
+   mark that as unified.  We expect to be in a single block.  */
+
+static void
+nvptx_propagate_unified (rtx_insn *unified)
+{
+  rtx_insn *probe = unified;
+  rtx cond_reg = SET_DEST (PATTERN (unified));
+  rtx pat = NULL_RTX;
+
+  /* Find the comparison.  (We could skip this and simply scan to he
+     blocks' terminating branch, if we didn't care for self
+     checking.)  */
+  for (;;)
+    {
+      probe = next_real_insn (probe);
+      if (!probe)
+	break;
+      pat = PATTERN (probe);
+
+      if (GET_CODE (pat) == SET
+	  && GET_RTX_CLASS (GET_CODE (SET_SRC (pat))) == RTX_COMPARE
+	  && XEXP (SET_SRC (pat), 0) == cond_reg)
+	break;
+      gcc_assert (NONJUMP_INSN_P (probe));
+    }
+  gcc_assert (pat);
+  rtx pred_reg = SET_DEST (pat);
+
+  /* Find the branch.  */
+  do
+    probe = NEXT_INSN (probe);
+  while (!JUMP_P (probe));
+
+  pat = PATTERN (probe);
+  rtx itec = XEXP (SET_SRC (pat), 0);
+  gcc_assert (XEXP (itec, 0) == pred_reg);
+
+  /* Mark the branch's condition as unified.  */
+  rtx unspec = gen_rtx_UNSPEC (BImode, gen_rtvec (1, pred_reg),
+			       UNSPEC_BR_UNIFIED);
+  bool ok = validate_change (probe, &XEXP (itec, 0), unspec, false);
+
+  gcc_assert (ok);
+}
+
 /* Loop structure of the function.  The entire function is described as
    a NULL loop.  */
 
@@ -2964,6 +3010,9 @@ nvptx_split_blocks (bb_insn_map_t *map)
 	    continue;
 	  switch (recog_memoized (insn))
 	    {
+	    case CODE_FOR_cond_uni:
+	      nvptx_propagate_unified (insn);
+	      /* FALLTHROUGH */
 	    default:
 	      seen_insn = true;
 	      continue;
@@ -5080,6 +5129,21 @@ nvptx_expand_cmp_swap (tree exp, rtx target,
   return target;
 }
 
+/* Expander for the compare unified builtin.  */
+
+static rtx
+nvptx_expand_cond_uni (tree exp, rtx target, machine_mode mode, int ignore)
+{
+  if (ignore)
+    return target;
+  
+  rtx src = expand_expr (CALL_EXPR_ARG (exp, 0),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+
+  emit_insn (gen_cond_uni (target, src));
+
+  return target;
+}
 
 /* Codes for all the NVPTX builtins.  */
 enum nvptx_builtins
@@ -5089,6 +5153,7 @@ enum nvptx_builtins
   NVPTX_BUILTIN_WORKER_ADDR,
   NVPTX_BUILTIN_CMP_SWAP,
   NVPTX_BUILTIN_CMP_SWAPLL,
+  NVPTX_BUILTIN_COND_UNI,
   NVPTX_BUILTIN_MAX
 };
 
@@ -5126,6 +5191,7 @@ nvptx_init_builtins (void)
        (PTRVOID, ST, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAP, "cmp_swap", (UINT, PTRVOID, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAPLL, "cmp_swapll", (LLUINT, PTRVOID, LLUINT, LLUINT, NULL_TREE));
+  DEF (COND_UNI, "cond_uni", (integer_type_node, integer_type_node, NULL_TREE));
 
 #undef DEF
 #undef ST
@@ -5158,6 +5224,9 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
     case NVPTX_BUILTIN_CMP_SWAPLL:
       return nvptx_expand_cmp_swap (exp, target, mode, ignore);
 
+    case NVPTX_BUILTIN_COND_UNI:
+      return nvptx_expand_cond_uni (exp, target, mode, ignore);
+
     default: gcc_unreachable ();
     }
 }
@@ -5284,7 +5353,7 @@ nvptx_get_worker_red_addr (tree type, tree offset)
 
 static void
 nvptx_generate_vector_shuffle (location_t loc,
-			       tree dest_var, tree var, unsigned shift,
+			       tree dest_var, tree var, tree bits,
 			       gimple_seq *seq)
 {
   unsigned fn = NVPTX_BUILTIN_SHUFFLE;
@@ -5307,7 +5376,6 @@ nvptx_generate_vector_shuffle (location_t loc,
     }
   
   tree call = nvptx_builtin_decl (fn, true);
-  tree bits = build_int_cst (unsigned_type_node, shift);
   tree kind = build_int_cst (unsigned_type_node, SHUFFLE_DOWN);
   tree expr;
 
@@ -5583,6 +5651,126 @@ nvptx_reduction_update (location_t loc, gimple_stmt_iterator *gsi,
     return nvptx_lockfull_update (loc, gsi, ptr, var, op);
 }
 
+/* Emit a vector-level reduction loop.  OLD_VAR is the incoming
+   variable to reduce (valid in each vector), OP is the reduction
+   operator.  Return the reduced value (an SSA var).
+
+   The code we generate looks like:
+      unsigned old_shift = DIM_SIZE(VECTOR);
+      do 
+	{
+	  shift = PHI (old_shift, new_shift);
+	  var = PHI (old_var, new_var);
+	  new_shift = shift >> 1;
+	  other_var = VSHUFFLE (var, new_shift);
+	  new_var = var OP other_var;
+	  cond_var = builtin_cond_uni (new_shift);
+	}
+	while (cond_var > 1);
+
+  The builtin_cond_ini expands to a cond_uni instruction, which is
+  processed in nvpts_split_blocks to mark the loop's terminating
+  branch instruction.  */
+
+static tree
+nvptx_vector_reduction (location_t loc, gimple_stmt_iterator *gsi,
+			tree old_var, tree_code op)
+{
+  tree var_type = TREE_TYPE (old_var);
+
+  /*  Emit old_shift = DIM_SIZE(VECTOR) */
+  tree old_shift = make_ssa_name (integer_type_node);
+  tree dim = build_int_cst (integer_type_node, GOMP_DIM_VECTOR);
+  gcall *call = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, dim);
+  gimple_set_lhs (call, old_shift);
+  gimple_set_location (call, loc);
+  gsi_insert_before (gsi, call, GSI_SAME_STMT);
+
+  /* Split the block just after the init stmts.  */
+  basic_block pre_bb = gsi_bb (*gsi);
+  edge pre_edge = split_block (pre_bb, call);
+  basic_block loop_bb = pre_edge->dest;
+  pre_bb = pre_edge->src;
+  /* Reset the iterator.  */
+  *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+  tree shift = make_ssa_name (integer_type_node);
+  tree new_shift = make_ssa_name (integer_type_node);
+  tree var = make_ssa_name (var_type);
+  tree other_var = make_ssa_name (var_type);
+  tree new_var = make_ssa_name (var_type);
+  
+  /* Build and insert the loop body.  */
+  gimple_seq loop_seq = NULL;
+
+  /* new_shift = shift >> 1 */
+  tree shift_expr = fold_build2 (RSHIFT_EXPR, integer_type_node,
+				 shift, integer_one_node);
+  gimplify_assign (new_shift, shift_expr, &loop_seq);
+
+  /* other_var = shuffle (var, shift) */
+  nvptx_generate_vector_shuffle (loc, other_var, var, new_shift, &loop_seq);
+  /* new_var = var  OP other_var */
+  tree red_expr = fold_build2 (op, var_type, var, other_var);
+  gimplify_assign (new_var, red_expr, &loop_seq);
+
+  /* Mark the iterator variable as unified.  */
+  tree cond_var = make_ssa_name (integer_type_node);
+  tree uni_fn = nvptx_builtin_decl (NVPTX_BUILTIN_COND_UNI, true);
+  tree uni_expr = build_call_expr_loc (loc, uni_fn, 1, new_shift);
+  gimplify_assign (cond_var,  uni_expr, &loop_seq);
+
+  gcond *cond = gimple_build_cond (LE_EXPR, cond_var, integer_one_node,
+				   NULL_TREE, NULL_TREE);
+  gimple_seq_add_stmt (&loop_seq, cond);
+  
+  gsi_insert_seq_before (gsi, loop_seq, GSI_SAME_STMT);
+
+  /* Split the block just after the loop stmts.  */
+  edge post_edge = split_block (loop_bb, cond);
+  basic_block post_bb = post_edge->dest;
+  loop_bb = post_edge->src;
+  *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+  /* Create the loop.  */
+  post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;
+  edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_FALSE_VALUE);
+  set_immediate_dominator (CDI_DOMINATORS, loop_bb, pre_bb);
+  set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
+
+  gphi *shift_phi = create_phi_node (shift, loop_bb);
+  add_phi_arg (shift_phi, old_shift, pre_edge, loc);
+  add_phi_arg (shift_phi, new_shift, loop_edge, loc);
+
+  gphi *var_phi = create_phi_node (var, loop_bb);
+  add_phi_arg (var_phi, old_var, pre_edge, loc);
+  add_phi_arg (var_phi, new_var, loop_edge, loc);
+
+  loop *loop = alloc_loop ();
+  loop->header = loop_bb;
+  loop->latch = loop_bb;
+  add_loop (loop, loop_bb->loop_father);
+
+  return new_var;
+}
+
+/* Dummy reduction vars that have GOMP_MAP_FIRSTPRIVATE_POINTER data
+   mappings gets retyped to (void *).  Adjust the type of VAR to TYPE
+   as appropriate.  */
+
+static tree
+nvptx_adjust_reduction_type (tree var, tree type, gimple_seq *seq)
+{
+  if (TREE_TYPE (TREE_TYPE (var)) == type)
+    return var;
+
+  tree ptype = build_pointer_type (type);
+  tree t = make_ssa_name (ptype);
+  tree expr = fold_build1 (NOP_EXPR, ptype, var);
+  gimple_seq_add_stmt (seq, gimple_build_assign (t, expr));
+  return t;
+}
+
 /* NVPTX implementation of GOACC_REDUCTION_SETUP.  */
 
 static void
@@ -5602,7 +5790,11 @@ nvptx_goacc_reduction_setup (gcall *call)
       tree ref_to_res = gimple_call_arg (call, 1);
 
       if (!integer_zerop (ref_to_res))
-	var = build_simple_mem_ref (ref_to_res);
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  var = build_simple_mem_ref (ref_to_res);
+	}
     }
   
   if (level == GOMP_DIM_WORKER)
@@ -5702,7 +5894,11 @@ nvptx_goacc_reduction_init (gcall *call)
 	    init = var;
 	}
 
-      gimplify_assign (lhs, init, &seq);
+      /* The LHS may be NULL if a reduction variable on a parallel
+	 construct is initialized to some constant inside the parallel
+	 region.  */
+      if (lhs)
+	gimplify_assign (lhs, init, &seq);
     }
 
   pop_gimplify_context (NULL);
@@ -5727,22 +5923,7 @@ nvptx_goacc_reduction_fini (gcall *call)
   push_gimplify_context (true);
 
   if (level == GOMP_DIM_VECTOR)
-    {
-      /* Emit binary shuffle tree.  TODO. Emit this as an actual loop,
-	 but that requires a method of emitting a unified jump at the
-	 gimple level.  */
-      for (int shfl = PTX_VECTOR_LENGTH / 2; shfl > 0; shfl = shfl >> 1)
-	{
-	  tree other_var = make_ssa_name (TREE_TYPE (var));
-	  nvptx_generate_vector_shuffle (gimple_location (call),
-					 other_var, var, shfl, &seq);
-
-	  r = make_ssa_name (TREE_TYPE (var));
-	  gimplify_assign (r, fold_build2 (op, TREE_TYPE (var),
-					   var, other_var), &seq);
-	  var = r;
-	}
-    }
+    r = nvptx_vector_reduction (gimple_location (call), &gsi, var, op);
   else
     {
       tree accum = NULL_TREE;
@@ -5760,7 +5941,11 @@ nvptx_goacc_reduction_fini (gcall *call)
       else if (integer_zerop (ref_to_res))
 	r = var;
       else
-	accum = ref_to_res;
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  accum = ref_to_res;
+	}
 
       if (accum)
 	{
@@ -5809,7 +5994,11 @@ nvptx_goacc_reduction_teardown (gcall *call)
       tree ref_to_res = gimple_call_arg (call, 1);
 
       if (!integer_zerop (ref_to_res))
-	gimplify_assign (build_simple_mem_ref (ref_to_res), var, &seq);
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  gimplify_assign (build_simple_mem_ref (ref_to_res), var, &seq);
+	}
     }
 
   if (lhs)
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 2988f5dfa91..79c4c061841 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -600,6 +600,13 @@
   "%J0\\tbra.uni\\t%l1;"
   [(set_attr "predicable" "false")])
 
+(define_insn "cond_uni"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+          (unspec:SI [(match_operand:SI 1 "nvptx_nonmemory_operand" "R")]
+  		     UNSPEC_BR_UNIFIED))]
+  ""
+  "%.\\tmov%t0\\t%0, %1; // unified")
+
 (define_expand "cbranch<mode>4"
   [(set (pc)
 	(if_then_else (match_operator 0 "nvptx_comparison_operator"
diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index 66f0a33c2e2..1a3a7cc728c 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -32,9 +32,10 @@ enum oacc_loop_flags {
   OLF_INDEPENDENT = 1u << 2,	/* Iterations are known independent.  */
   OLF_GANG_STATIC = 1u << 3,	/* Gang partitioning is static (has op). */
   OLF_TILE	= 1u << 4,	/* Tiled loop. */
-  
+  OLF_REDUCTION = 1u << 5,	/* Reduction loop.  */
+
   /* Explicitly specified loop axes.  */
-  OLF_DIM_BASE = 5,
+  OLF_DIM_BASE = 6,
   OLF_DIM_GANG   = 1u << (OLF_DIM_BASE + GOMP_DIM_GANG),
   OLF_DIM_WORKER = 1u << (OLF_DIM_BASE + GOMP_DIM_WORKER),
   OLF_DIM_VECTOR = 1u << (OLF_DIM_BASE + GOMP_DIM_VECTOR),
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c591231d8f1..792d338f0fd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4855,6 +4855,7 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	tree ref_to_res = NULL_TREE;
 	tree incoming, outgoing, v1, v2, v3;
 	bool is_private = false;
+	bool is_fpp = false;
 
 	enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
 	if (rcode == MINUS_EXPR)
@@ -4913,19 +4914,37 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		      is_private = true;
 		      goto do_lookup;
 		    }
+		  else if (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_MAP
+			   && (OMP_CLAUSE_MAP_KIND (cls)
+			       == GOMP_MAP_FIRSTPRIVATE_POINTER)
+			   && orig == OMP_CLAUSE_DECL (cls))
+		    {
+		      is_fpp = true;
+		      goto do_lookup;
+		    }
 	      }
 
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
-		&& maybe_lookup_field (orig, outer) && !is_private)
+		&& (maybe_lookup_field (orig, outer) || is_fpp) && !is_private)
 	      {
-		ref_to_res = build_receiver_ref (orig, false, outer);
-		if (omp_is_reference (orig))
-		  ref_to_res = build_simple_mem_ref (ref_to_res);
-
 		tree type = TREE_TYPE (var);
+
+		if (is_fpp)
+		  {
+		    tree x = create_tmp_var (type);
+		    gimplify_assign (x, lookup_decl (orig, outer), fork_seq);
+		    ref_to_res = x;
+		  }
+		else
+		  {
+		    ref_to_res = build_receiver_ref (orig, false, outer);
+		    if (omp_is_reference (orig))
+		      ref_to_res = build_simple_mem_ref (ref_to_res);
+		  }
+
 		if (POINTER_TYPE_P (type))
 		  type = TREE_TYPE (type);
 
@@ -5633,6 +5652,10 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
 	  tag |= OLF_TILE;
 	  break;
 
+	case OMP_CLAUSE_REDUCTION:
+	  tag |= OLF_REDUCTION;
+	  break;
+
 	default:
 	  continue;
 	}
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 0abf0283c9e..07ca759a7d8 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1280,6 +1280,13 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
 	 non-innermost available level.  */
       unsigned this_mask = GOMP_DIM_MASK (GOMP_DIM_GANG);
 
+      /* Orphan reductions cannot have gang partitioning.  */
+      if ((loop->flags & OLF_REDUCTION)
+	  && oacc_get_fn_attrib (current_function_decl)
+	  && !lookup_attribute ("omp target entrypoint",
+				DECL_ATTRIBUTES (current_function_decl)))
+	this_mask = GOMP_DIM_MASK (GOMP_DIM_WORKER);
+
       /* Find the first outermost available partition. */
       while (this_mask <= outer_mask)
 	this_mask <<= 1;
@@ -1431,6 +1438,17 @@ default_goacc_reduction (gcall *call)
 
       if (!integer_zerop (ref_to_res))
 	{
+	  /* Dummy reduction vars that have GOMP_MAP_FIRSTPRIVATE_POINTER data
+	     mappings gets retyped to (void *).  Adjust the type of ref_to_res
+	     as appropriate.  */
+	  if (TREE_TYPE (TREE_TYPE (ref_to_res)) != TREE_TYPE (var))
+	    {
+	      tree ptype = build_pointer_type (TREE_TYPE (var));
+	      tree t = make_ssa_name (ptype);
+	      tree expr = fold_build1 (NOP_EXPR, ptype, ref_to_res);
+	      gimple_seq_add_stmt (&seq, gimple_build_assign (t, expr));
+	      ref_to_res = t;
+	    }
 	  tree dst = build_simple_mem_ref (ref_to_res);
 	  tree src = var;
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - FE changes
  2018-06-29 18:20 [patch] various OpenACC reduction enhancements Cesar Philippidis
  2018-06-29 18:22 ` [patch] various OpenACC reduction enhancements - ME and nvptx changes Cesar Philippidis
@ 2018-06-29 18:23 ` Cesar Philippidis
  2018-12-04 12:57   ` Jakub Jelinek
  2018-06-29 18:38 ` [patch] various OpenACC reduction enhancements - test cases Cesar Philippidis
  2 siblings, 1 reply; 14+ messages in thread
From: Cesar Philippidis @ 2018-06-29 18:23 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Tom de Vries, Fortran List

[-- Attachment #1: Type: text/plain, Size: 218 bytes --]

Attaches are the FE changes for the OpenACC reduction enhancements. It
depends on the ME patch.

Is this patch OK for trunk? It bootstrapped / regression tested cleanly
for x86_64 with nvptx offloading.

Thanks,
Cesar

[-- Attachment #2: trunk-reductions-fe.diff --]
[-- Type: text/x-patch, Size: 10465 bytes --]

2018-06-29  Cesar Philippidis  <cesar@codesourcery.com>
	    Nathan Sidwell  <nathan@acm.org>

	gcc/c/
	* c-parser.c (c_parser_omp_variable_list): New c_omp_region_type
	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
	OpenACC.
	(c_parser_omp_clause_reduction): Update call to
	c_parser_omp_variable_list.  Propage OpenACC errors as necessary.
	(c_parser_oacc_all_clauses): Update call to
	p_parser_omp_clause_reduction.
	(c_parser_omp_all_clauses): Likewise.
	* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
	gang reductions.

	gcc/cp/
	* parser.c (cp_parser_omp_var_list_no_open):  New c_omp_region_type
	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
	OpenACC.
	(cp_parser_omp_clause_reduction): Update call to
	cp_parser_omp_variable_list.  Propage OpenACC errors as necessary.
	(cp_parser_oacc_all_clauses): Update call to
	cp_parser_omp_clause_reduction.
	(cp_parser_omp_all_clauses): Likewise.
	* semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC
	gang reductions.

	gcc/fortran/
	* openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
	gang reductions.
	* trans-openmp.c (gfc_omp_clause_copy_ctor): Permit reductions.

---
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7a926285f3a..a6f453dae54 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -965,12 +965,13 @@ class token_pair
 
   /* Like token_pair::require_close, except that tokens will be skipped
      until the desired token is found.  An error message is still produced
-     if the next token is not as expected.  */
+     if the next token is not as expected, unless QUIET is set.  */
 
-  void skip_until_found_close (c_parser *parser) const
+  void skip_until_found_close (c_parser *parser, bool quiet = false) const
   {
     c_parser_skip_until_found (parser, traits_t::close_token_type,
-			       traits_t::close_gmsgid, m_open_loc);
+			       quiet ? NULL : traits_t::close_gmsgid,
+			       m_open_loc);
   }
 
  private:
@@ -11498,7 +11499,8 @@ c_parser_oacc_wait_list (c_parser *parser, location_t clause_loc, tree list)
 static tree
 c_parser_omp_variable_list (c_parser *parser,
 			    location_t clause_loc,
-			    enum omp_clause_code kind, tree list)
+			    enum omp_clause_code kind, tree list,
+			    enum c_omp_region_type ort = C_ORT_OMP)
 {
   if (c_parser_next_token_is_not (parser, CPP_NAME)
       || c_parser_peek_token (parser)->id_kind != C_ID_ID)
@@ -11557,6 +11559,22 @@ c_parser_omp_variable_list (c_parser *parser,
 	      /* FALLTHROUGH  */
 	    case OMP_CLAUSE_DEPEND:
 	    case OMP_CLAUSE_REDUCTION:
+	      if (kind == OMP_CLAUSE_REDUCTION && ort == C_ORT_ACC)
+		{
+		  switch (c_parser_peek_token (parser)->type)
+		    {
+		    case CPP_OPEN_PAREN:
+		    case CPP_OPEN_SQUARE:
+		    case CPP_DOT:
+		    case CPP_DEREF:
+		      error ("invalid reduction variable");
+		      t = error_mark_node;
+		    default:;
+		      break;
+		    }
+		  if (t == error_mark_node)
+		    break;
+		}
 	      while (c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
 		{
 		  tree low_bound = NULL_TREE, length = NULL_TREE;
@@ -12789,9 +12807,12 @@ c_parser_omp_clause_private (c_parser *parser, tree list)
      identifier  */
 
 static tree
-c_parser_omp_clause_reduction (c_parser *parser, tree list)
+c_parser_omp_clause_reduction (c_parser *parser, tree list,
+			       enum c_omp_region_type ort)
 {
   location_t clause_loc = c_parser_peek_token (parser)->location;
+  bool seen_error = false;
+
   matching_parens parens;
   if (parens.require_open (parser))
     {
@@ -12855,7 +12876,13 @@ c_parser_omp_clause_reduction (c_parser *parser, tree list)
 	  tree nl, c;
 
 	  nl = c_parser_omp_variable_list (parser, clause_loc,
-					   OMP_CLAUSE_REDUCTION, list);
+					   OMP_CLAUSE_REDUCTION, list, ort);
+	  if (c_parser_peek_token (parser)->type != CPP_CLOSE_PAREN)
+	    {
+	      seen_error = true;
+	      goto cleanup;
+	    }
+
 	  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
 	    {
 	      tree d = OMP_CLAUSE_DECL (c), type;
@@ -12891,7 +12918,8 @@ c_parser_omp_clause_reduction (c_parser *parser, tree list)
 
 	  list = nl;
 	}
-      parens.skip_until_found_close (parser);
+    cleanup:
+      parens.skip_until_found_close (parser, seen_error);
     }
   return list;
 }
@@ -13998,7 +14026,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "private";
 	  break;
 	case PRAGMA_OACC_CLAUSE_REDUCTION:
-	  clauses = c_parser_omp_clause_reduction (parser, clauses);
+	  clauses = c_parser_omp_clause_reduction (parser, clauses, C_ORT_ACC);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
@@ -14157,7 +14185,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "private";
 	  break;
 	case PRAGMA_OMP_CLAUSE_REDUCTION:
-	  clauses = c_parser_omp_clause_reduction (parser, clauses);
+	  clauses = c_parser_omp_clause_reduction (parser, clauses, C_ORT_OMP);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OMP_CLAUSE_SCHEDULE:
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 90ae306c99a..944db3fa8be 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13087,6 +13087,14 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  goto check_dup_generic;
 
 	case OMP_CLAUSE_REDUCTION:
+	  if (ort == C_ORT_ACC && oacc_get_fn_attrib (current_function_decl)
+	      && omp_find_clause (clauses, OMP_CLAUSE_GANG))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"gang reduction on an orphan loop");
+	      remove = true;
+	      break;
+	    }
 	  need_implicitly_determined = true;
 	  t = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (t) == TREE_LIST)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a076de146e6..6444293bb82 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -31592,7 +31592,8 @@ check_no_duplicate_clause (tree clauses, enum omp_clause_code code,
 
 static tree
 cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
-				tree list, bool *colon)
+				tree list, bool *colon,
+				enum c_omp_region_type ort = C_ORT_OMP)
 {
   cp_token *token;
   bool saved_colon_corrects_to_scope_p = parser->colon_corrects_to_scope_p;
@@ -31668,6 +31669,21 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
 	      /* FALLTHROUGH.  */
 	    case OMP_CLAUSE_DEPEND:
 	    case OMP_CLAUSE_REDUCTION:
+	      if (kind == OMP_CLAUSE_REDUCTION && ort == C_ORT_ACC)
+		{
+		  switch (cp_lexer_peek_token (parser->lexer)->type)
+		    {
+		    case CPP_OPEN_PAREN:
+		    case CPP_OPEN_SQUARE:
+		    case CPP_DOT:
+		    case CPP_DEREF:
+		      error ("invalid reduction variable");
+		      decl = error_mark_node;
+		      goto skip_comma;
+		    default:;
+		      break;
+		    }
+		}
 	      while (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_SQUARE))
 		{
 		  tree low_bound = NULL_TREE, length = NULL_TREE;
@@ -32746,7 +32762,8 @@ cp_parser_omp_clause_ordered (cp_parser *parser,
      id-expression  */
 
 static tree
-cp_parser_omp_clause_reduction (cp_parser *parser, tree list)
+cp_parser_omp_clause_reduction (cp_parser *parser, tree list,
+				enum c_omp_region_type ort)
 {
   enum tree_code code = ERROR_MARK;
   tree nlist, c, id = NULL_TREE;
@@ -32827,7 +32844,7 @@ cp_parser_omp_clause_reduction (cp_parser *parser, tree list)
     goto resync_fail;
 
   nlist = cp_parser_omp_var_list_no_open (parser, OMP_CLAUSE_REDUCTION, list,
-					  NULL);
+					  NULL, ort);
   for (c = nlist; c != list; c = OMP_CLAUSE_CHAIN (c))
     {
       OMP_CLAUSE_REDUCTION_CODE (c) = code;
@@ -33868,7 +33885,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "private";
 	  break;
 	case PRAGMA_OACC_CLAUSE_REDUCTION:
-	  clauses = cp_parser_omp_clause_reduction (parser, clauses);
+	  clauses = cp_parser_omp_clause_reduction (parser, clauses, C_ORT_ACC);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
@@ -34055,7 +34072,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "private";
 	  break;
 	case PRAGMA_OMP_CLAUSE_REDUCTION:
-	  clauses = cp_parser_omp_clause_reduction (parser, clauses);
+	  clauses = cp_parser_omp_clause_reduction (parser, clauses, C_ORT_OMP);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OMP_CLAUSE_SCHEDULE:
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index c779137da45..177acdd9cc4 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5875,6 +5875,14 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP);
 	  goto check_dup_generic;
 	case OMP_CLAUSE_REDUCTION:
+	  if (ort == C_ORT_ACC && oacc_get_fn_attrib (current_function_decl)
+	      && omp_find_clause (clauses, OMP_CLAUSE_GANG))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"gang reduction on an orphan loop");
+	      remove = true;
+	      break;
+	    }
 	  field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP);
 	  t = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (t) == TREE_LIST)
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 94a7f7eaa50..38d857c14e5 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5878,6 +5878,18 @@ resolve_oacc_loop_blocks (gfc_code *code)
 	  break;
       }
 
+  if (code->op == EXEC_OACC_LOOP
+      && code->ext.omp_clauses->lists[OMP_LIST_REDUCTION]
+      && code->ext.omp_clauses->gang)
+    {
+      for (c = omp_current_ctx; c; c = c->previous)
+	if (!oacc_is_loop (c->code))
+	  break;
+      if (c == NULL || !(oacc_is_parallel (c->code)
+			 || oacc_is_kernels (c->code)))
+      gfc_error ("gang reduction on an orphan loop at %L", &code->loc);
+    }
+
   if (code->ext.omp_clauses->seq)
     {
       if (code->ext.omp_clauses->independent)
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f038f4c5bf8..c6484d62916 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -563,7 +563,8 @@ gfc_omp_clause_copy_ctor (tree clause, tree dest, tree src)
   stmtblock_t block, cond_block;
 
   gcc_assert (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_FIRSTPRIVATE
-	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_LINEAR);
+	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_LINEAR
+	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_REDUCTION);
 
   if ((! GFC_DESCRIPTOR_TYPE_P (type)
        || GFC_TYPE_ARRAY_AKIND (type) != GFC_ARRAY_ALLOCATABLE)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - test cases
  2018-06-29 18:20 [patch] various OpenACC reduction enhancements Cesar Philippidis
  2018-06-29 18:22 ` [patch] various OpenACC reduction enhancements - ME and nvptx changes Cesar Philippidis
  2018-06-29 18:23 ` [patch] various OpenACC reduction enhancements - FE changes Cesar Philippidis
@ 2018-06-29 18:38 ` Cesar Philippidis
  2018-12-04 12:59   ` Jakub Jelinek
  2 siblings, 1 reply; 14+ messages in thread
From: Cesar Philippidis @ 2018-06-29 18:38 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 184 bytes --]

Attached are the updated reductions tests cases. Again, these have been
bootstrapped and regression tested cleanly for x86_64 with nvptx
offloading. Is it OK for trunk?

Thanks,
Cesar

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: trunk-reductions-tests.diff --]
[-- Type: text/x-patch; name="trunk-reductions-tests.diff", Size: 36490 bytes --]

2018-06-29  Cesar Philippidis  <cesar@codesourcery.com>
	    Nathan Sidwell  <nathan@acm.org>

	gcc/testsuite/
	* c-c++-common/goacc/orphan-reductions-1.c: New test.
	* c-c++-common/goacc/reduction-7.c: New test.
	* c-c++-common/goacc/routine-4.c: Update.
	* g++.dg/goacc/reductions-1.C: New test.
	* gcc.dg/goacc/loop-processing-1.c: Update.
	* gfortran.dg/goacc/orphan-reductions-1.f90: New test.

	libgomp/
	* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
	* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
	* libgomp.oacc-fortran/reduction-9.f90: New test.


From b128e80be7cd2c81171fbd9c8b23e786bb832633 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis <cesar@codesourcery.com>
Date: Thu, 21 Jun 2018 11:37:56 -0700
Subject: [PATCH] Trunk reductions patches

OG8 Reduction patches

4469fc4 [Fortran] Permit reductions in gfc_omp_clause_copy_ctor
704f1a2 [nxptx, OpenACC] vector reductions
8a35c89 [OpenACC] Fix a reduction bug involving GOMP_MAP_FIRSTPRIVATE_POINTER variables
16ead33 [OpenACC] Update error messages for c and c++ reductions
65dd9cf Make OpenACC orphan gang reductions errors
5d60102 [PR80547] Handle parallel reductions explicitly initialized by the user
---
 gcc/c/c-parser.c                              |  46 +-
 gcc/c/c-typeck.c                              |   8 +
 gcc/config/nvptx/nvptx.c                      | 233 +++++++-
 gcc/config/nvptx/nvptx.md                     |   7 +
 gcc/cp/parser.c                               |  27 +-
 gcc/cp/semantics.c                            |   8 +
 gcc/fortran/openmp.c                          |  12 +
 gcc/fortran/trans-openmp.c                    |   3 +-
 gcc/omp-general.h                             |   5 +-
 gcc/omp-low.c                                 |  33 +-
 gcc/omp-offload.c                             |  18 +
 .../c-c++-common/goacc/orphan-reductions-1.c  |  56 ++
 .../c-c++-common/goacc/reduction-7.c          | 111 ++++
 gcc/testsuite/c-c++-common/goacc/routine-4.c  |   8 +-
 gcc/testsuite/g++.dg/goacc/reductions-1.C     | 548 ++++++++++++++++++
 .../gcc.dg/goacc/loop-processing-1.c          |   3 +-
 .../gfortran.dg/goacc/orphan-reductions-1.f90 | 204 +++++++
 .../par-reduction-3.c                         |  29 +
 .../reduction-cplx-flt-2.c                    |  32 +
 .../libgomp.oacc-fortran/reduction-9.f90      |  54 ++
 20 files changed, 1396 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/reduction-7.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/reductions-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90

diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
new file mode 100644
index 00000000000..b0bd4a7de05
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
@@ -0,0 +1,56 @@
+/* Test orphan reductions.  */
+
+#include <assert.h>
+
+#pragma acc routine seq
+int
+seq_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop seq reduction(+:sum)
+  for (i = 0; i < n; i++)
+    sum = sum + 1;
+
+  return sum;
+}
+
+#pragma acc routine gang
+int
+gang_reduction (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc loop gang reduction(+:s1) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+    s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+    s2 = s2 + 2;
+
+
+  return s1 + s2;
+}
+
+#pragma acc routine worker
+int
+worker_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop worker reduction(+:sum)
+  for (i = 0; i < n; i++)
+    sum = sum + 3;
+
+  return sum;
+}
+
+#pragma acc routine vector
+int
+vector_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop vector reduction(+:sum)
+  for (i = 0; i < n; i++)
+    sum = sum + 4;
+
+  return sum;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-7.c b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
new file mode 100644
index 00000000000..245c848d509
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
@@ -0,0 +1,111 @@
+/* Exercise invalid reductions on array and struct members.  */
+
+void
+test_parallel ()
+{
+  struct {
+    int a;
+    float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel reduction(+:s1.a) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s1.a += 1;
+
+#pragma acc parallel reduction(+:s1.b[3]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s1.b[3] += 1;
+
+#pragma acc parallel reduction(+:s2[2].a) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s2[2].a += 1;
+
+#pragma acc parallel reduction(+:s2[3].b[4]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s2[3].b[4] += 1;
+
+#pragma acc parallel reduction(+:z[5]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    z[5] += 1;
+}
+
+void
+test_combined ()
+{
+  struct {
+    int a;
+    float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel loop reduction(+:s1.a) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s1.a += 1;
+
+#pragma acc parallel loop reduction(+:s1.b[3]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s1.b[3] += 1;
+
+#pragma acc parallel loop reduction(+:s2[2].a) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s2[2].a += 1;
+
+#pragma acc parallel loop reduction(+:s2[3].b[4]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s2[3].b[4] += 1;
+
+#pragma acc parallel loop reduction(+:z[5]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    z[5] += 1;
+
+}
+
+void
+test_loops ()
+{
+  struct {
+    int a;
+    float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel
+  {
+#pragma acc loop reduction(+:s1.a) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s1.a += 1;
+
+#pragma acc loop reduction(+:s1.b[3]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s1.b[3] += 1;
+
+#pragma acc loop reduction(+:s2[2].a) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s2[2].a += 1;
+
+#pragma acc loop reduction(+:s2[3].b[4]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    s2[3].b[4] += 1;
+
+#pragma acc loop reduction(+:z[5]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+    z[5] += 1;
+  }
+}
+
+int
+main ()
+{
+  test_parallel ();
+  test_combined ();
+  test_loops ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-4.c b/gcc/testsuite/c-c++-common/goacc/routine-4.c
index efc4a0b95e5..91abfb5a91a 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-4.c
@@ -22,7 +22,7 @@ void seq (void)
   for (int i = 0; i < 10; i++)
     red ++;
 
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+#pragma acc loop seq reduction (+:red)
   for (int i = 0; i < 10; i++)
     red ++;
 
@@ -48,7 +48,7 @@ void vector (void) /* { dg-message "declared here" "1" } */
   for (int i = 0; i < 10; i++)
     red ++;
 
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+#pragma acc loop seq reduction (+:red)
   for (int i = 0; i < 10; i++)
     red ++;
 
@@ -74,7 +74,7 @@ void worker (void) /* { dg-message "declared here" "2" } */
   for (int i = 0; i < 10; i++)
     red ++;
 
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+#pragma acc loop seq reduction (+:red)
   for (int i = 0; i < 10; i++)
     red ++;
 
@@ -100,7 +100,7 @@ void gang (void) /* { dg-message "declared here" "3" } */
   for (int i = 0; i < 10; i++)
     red ++;
 
-#pragma acc loop gang reduction (+:red)
+#pragma acc loop seq reduction (+:red)
   for (int i = 0; i < 10; i++)
     red ++;
 
diff --git a/gcc/testsuite/g++.dg/goacc/reductions-1.C b/gcc/testsuite/g++.dg/goacc/reductions-1.C
new file mode 100644
index 00000000000..6ff426a70fd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/goacc/reductions-1.C
@@ -0,0 +1,548 @@
+// Test for invalid reduction variables.
+
+class C1
+{
+  int b, d[10];
+
+public:
+  int a, c[10];
+
+  C1 () { a = 0; b = 0; }
+  int& get_b () { return b; }
+  int* get_d () { return d; }
+};
+
+template <typename T>
+class C2
+{
+  T b, d[10];
+
+public:
+  T a, c[10];
+
+  C2 () { a = 0; b = 0; }
+  T& get_b () { return b; }
+  T* get_d () { return d; }
+};
+
+struct S1
+{
+  int a, b, c[10], d[10];
+
+  S1 () { a = 0; b = 0; }
+  int& get_b () { return b; }
+  int* get_d () { return d; }
+};
+
+template <typename T>
+struct S2
+{
+  T a, b, c[10], d[10];
+
+  S2 () { a = 0; b = 0; }
+  T& get_b () { return b; }
+  T* get_d () { return d; }
+};
+
+template <typename T>
+void
+test_parallel ()
+{
+  int i, a[10];
+  T b[10];
+  C1 c1, c1a[10];
+  C2<T> c2, c2a[10];
+  S1 s1, s1a[10];
+  S2<float> s2, s2a[10];
+
+  // Reductions on class members.
+
+#pragma acc parallel reduction(+:c1.a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1.a += 1;
+
+#pragma acc parallel reduction(+:c1.get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1.get_b () += 1;
+
+#pragma acc parallel reduction(+:c1.c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1.c[1] += 1;
+
+#pragma acc parallel reduction(+:c1.get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1.get_d ()[1] += 1;
+
+#pragma acc parallel reduction(+:c1a[1].a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1a[1].a += 1;
+
+#pragma acc parallel reduction(+:c1a[1].get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1a[1].get_b () += 1;
+
+#pragma acc parallel reduction(+:c1a[1].c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1a[1].c[1] += 1;
+
+#pragma acc parallel reduction(+:c1a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1a[1].get_d ()[1] += 1;
+
+
+  // Reductions on a template class member.
+
+#pragma acc parallel reduction(+:c2.a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2.a += 1;
+
+#pragma acc parallel reduction(+:c2.get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2.get_b () += 1;
+
+#pragma acc parallel reduction(+:c2.c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2.c[1] += 1;
+
+#pragma acc parallel reduction(+:c2.get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2.get_d ()[1] += 1;
+
+
+#pragma acc parallel reduction(+:c2a[1].a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2a[1].a += 1;
+
+#pragma acc parallel reduction(+:c2a[1].get_b ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2a[1].get_b () += 1;
+
+#pragma acc parallel reduction(+:c2a[1].c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2a[1].c[1] += 1;
+
+#pragma acc parallel reduction(+:c2a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2a[1].get_d ()[1] += 1;
+
+
+  // Reductions on struct element.
+
+#pragma acc parallel reduction(+:s1.a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1.a += 1;
+
+#pragma acc parallel reduction(+:s1.get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1.get_b () += 1;
+
+#pragma acc parallel reduction(+:s1.c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1.c[1] += 1;
+
+#pragma acc parallel reduction(+:s1.get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1.get_d ()[1] += 1;
+
+#pragma acc parallel reduction(+:s1a[1].a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1a[1].a += 1;
+
+#pragma acc parallel reduction(+:s1a[1].get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1a[1].get_b () += 1;
+
+#pragma acc parallel reduction(+:s1a[1].c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1a[1].c[1] += 1;
+
+#pragma acc parallel reduction(+:s1a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1a[1].get_d ()[1] += 1;
+
+
+  // Reductions on a template struct element.
+
+#pragma acc parallel reduction(+:s2.a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2.a += 1;
+
+#pragma acc parallel reduction(+:s2.get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2.get_b () += 1;
+
+#pragma acc parallel reduction(+:s2.c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2.c[1] += 1;
+
+#pragma acc parallel reduction(+:s2.get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2.get_d ()[1] += 1;
+
+#pragma acc parallel reduction(+:s2a[1].a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2a[1].a += 1;
+
+#pragma acc parallel reduction(+:s2a[1].get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2a[1].get_b () += 1;
+
+#pragma acc parallel reduction(+:s2a[1].c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2a[1].c[1] += 1;
+
+#pragma acc parallel reduction(+:s2a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2a[1].get_d ()[1] += 1;
+
+
+  // Reductions on arrays.
+
+#pragma acc parallel reduction(+:a[10]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    a[10] += 1;
+
+#pragma acc parallel reduction(+:b[10]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    b[10] += 1;
+}
+
+template <typename T>
+void
+test_combined ()
+{
+  int i, a[10];
+  T b[10];
+  C1 c1, c1a[10];
+  C2<T> c2, c2a[10];
+  S1 s1, s1a[10];
+  S2<float> s2, s2a[10];
+
+  // Reductions on class members.
+
+#pragma acc parallel loop reduction(+:c1.a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1.a += 1;
+
+#pragma acc parallel loop reduction(+:c1.get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1.get_b () += 1;
+
+#pragma acc parallel loop reduction(+:c1.c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1.c[1] += 1;
+
+#pragma acc parallel loop reduction(+:c1.get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1.get_d ()[1] += 1;
+
+#pragma acc parallel loop reduction(+:c1a[1].a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1a[1].a += 1;
+
+#pragma acc parallel loop reduction(+:c1a[1].get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1a[1].get_b () += 1;
+
+#pragma acc parallel loop reduction(+:c1a[1].c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1a[1].c[1] += 1;
+
+#pragma acc parallel loop reduction(+:c1a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c1a[1].get_d ()[1] += 1;
+
+
+  // Reductions on a template class member.
+
+#pragma acc parallel loop reduction(+:c2.a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2.a += 1;
+
+#pragma acc parallel loop reduction(+:c2.get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2.get_b () += 1;
+
+#pragma acc parallel loop reduction(+:c2.c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2.c[1] += 1;
+
+#pragma acc parallel loop reduction(+:c2.get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2.get_d ()[1] += 1;
+
+
+#pragma acc parallel loop reduction(+:c2a[1].a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2a[1].a += 1;
+
+#pragma acc parallel loop reduction(+:c2a[1].get_b ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2a[1].get_b () += 1;
+
+#pragma acc parallel loop reduction(+:c2a[1].c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2a[1].c[1] += 1;
+
+#pragma acc parallel loop reduction(+:c2a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    c2a[1].get_d ()[1] += 1;
+
+
+  // Reductions on struct element.
+
+#pragma acc parallel loop reduction(+:s1.a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1.a += 1;
+
+#pragma acc parallel loop reduction(+:s1.get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1.get_b () += 1;
+
+#pragma acc parallel loop reduction(+:s1.c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1.c[1] += 1;
+
+#pragma acc parallel loop reduction(+:s1.get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1.get_d ()[1] += 1;
+
+#pragma acc parallel loop reduction(+:s1a[1].a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1a[1].a += 1;
+
+#pragma acc parallel loop reduction(+:s1a[1].get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1a[1].get_b () += 1;
+
+#pragma acc parallel loop reduction(+:s1a[1].c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1a[1].c[1] += 1;
+
+#pragma acc parallel loop reduction(+:s1a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s1a[1].get_d ()[1] += 1;
+
+
+  // Reductions on a template struct element.
+
+#pragma acc parallel loop reduction(+:s2.a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2.a += 1;
+
+#pragma acc parallel loop reduction(+:s2.get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2.get_b () += 1;
+
+#pragma acc parallel loop reduction(+:s2.c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2.c[1] += 1;
+
+#pragma acc parallel loop reduction(+:s2.get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2.get_d ()[1] += 1;
+
+#pragma acc parallel loop reduction(+:s2a[1].a) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2a[1].a += 1;
+
+#pragma acc parallel loop reduction(+:s2a[1].get_b ()) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2a[1].get_b () += 1;
+
+#pragma acc parallel loop reduction(+:s2a[1].c[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2a[1].c[1] += 1;
+
+#pragma acc parallel loop reduction(+:s2a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    s2a[1].get_d ()[1] += 1;
+
+
+  // Reductions on arrays.
+
+#pragma acc parallel loop reduction(+:a[10]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    a[10] += 1;
+
+#pragma acc parallel loop reduction(+:b[10]) // { dg-error "invalid reduction variable" }
+  for (i = 0; i < 100; i++)
+    b[10] += 1;
+}
+
+template <typename T>
+void
+test_loop ()
+{
+  int i, a[10];
+  T b[10];
+  C1 c1, c1a[10];
+  C2<T> c2, c2a[10];
+  S1 s1, s1a[10];
+  S2<float> s2, s2a[10];
+
+  // Reductions on class members.
+
+  #pragma acc parallel
+  {
+
+#pragma acc loop reduction(+:c1.a) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c1.a += 1;
+
+#pragma acc loop reduction(+:c1.get_b ()) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c1.get_b () += 1;
+
+#pragma acc loop reduction(+:c1.c[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c1.c[1] += 1;
+
+#pragma acc loop reduction(+:c1.get_d ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c1.get_d ()[1] += 1;
+
+#pragma acc loop reduction(+:c1a[1].a) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c1a[1].a += 1;
+
+#pragma acc loop reduction(+:c1a[1].get_b ()) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c1a[1].get_b () += 1;
+
+#pragma acc loop reduction(+:c1a[1].c[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c1a[1].c[1] += 1;
+
+#pragma acc loop reduction(+:c1a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c1a[1].get_d ()[1] += 1;
+
+
+    // Reductions on a template class member.
+
+#pragma acc loop reduction(+:c2.a) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c2.a += 1;
+
+#pragma acc loop reduction(+:c2.get_b ()) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c2.get_b () += 1;
+
+#pragma acc loop reduction(+:c2.c[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c2.c[1] += 1;
+
+#pragma acc loop reduction(+:c2.get_d ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c2.get_d ()[1] += 1;
+
+
+#pragma acc loop reduction(+:c2a[1].a) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c2a[1].a += 1;
+
+#pragma acc loop reduction(+:c2a[1].get_b ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c2a[1].get_b () += 1;
+
+#pragma acc loop reduction(+:c2a[1].c[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c2a[1].c[1] += 1;
+
+#pragma acc loop reduction(+:c2a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      c2a[1].get_d ()[1] += 1;
+
+
+    // Reductions on struct element.
+
+#pragma acc loop reduction(+:s1.a) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s1.a += 1;
+
+#pragma acc loop reduction(+:s1.get_b ()) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s1.get_b () += 1;
+
+#pragma acc loop reduction(+:s1.c[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s1.c[1] += 1;
+
+#pragma acc loop reduction(+:s1.get_d ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s1.get_d ()[1] += 1;
+
+#pragma acc loop reduction(+:s1a[1].a) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s1a[1].a += 1;
+
+#pragma acc loop reduction(+:s1a[1].get_b ()) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s1a[1].get_b () += 1;
+
+#pragma acc loop reduction(+:s1a[1].c[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s1a[1].c[1] += 1;
+
+#pragma acc loop reduction(+:s1a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s1a[1].get_d ()[1] += 1;
+
+
+    // Reductions on a template struct element.
+
+#pragma acc loop reduction(+:s2.a) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s2.a += 1;
+
+#pragma acc loop reduction(+:s2.get_b ()) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s2.get_b () += 1;
+
+#pragma acc loop reduction(+:s2.c[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s2.c[1] += 1;
+
+#pragma acc loop reduction(+:s2.get_d ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s2.get_d ()[1] += 1;
+
+#pragma acc loop reduction(+:s2a[1].a) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s2a[1].a += 1;
+
+#pragma acc loop reduction(+:s2a[1].get_b ()) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s2a[1].get_b () += 1;
+
+#pragma acc loop reduction(+:s2a[1].c[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s2a[1].c[1] += 1;
+
+#pragma acc loop reduction(+:s2a[1].get_d ()[1]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      s2a[1].get_d ()[1] += 1;
+
+
+    // Reductions on arrays.
+
+#pragma acc loop reduction(+:a[10]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      a[10] += 1;
+
+#pragma acc loop reduction(+:b[10]) // { dg-error "invalid reduction variable" }
+    for (i = 0; i < 100; i++)
+      b[10] += 1;
+  }
+}
+
+int
+main ()
+{
+  test_parallel<double> ();
+  test_combined<long> ();
+  test_loop<short> ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
index bd4c07e7d81..1d222ab3291 100644
--- a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -15,4 +15,5 @@ void vector_1 (int *ary, int size)
   }
 }
 
-/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 24\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccdevlow" } } */
+/* { dg-final { scan-tree-dump {
+OpenACC loops.*Loop 0\(0\).*Loop [0-9]{2}\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccdevlow" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90 b/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90
new file mode 100644
index 00000000000..7f363d5a5ec
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90
@@ -0,0 +1,204 @@
+! Verify that gang reduction on orphan OpenACC loops reported as errors.
+
+subroutine s1
+  implicit none
+
+  integer, parameter :: n = 100
+  integer :: i, sum
+  sum = 0
+
+  !$acc parallel reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel loop gang reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+
+  !$acc parallel
+  !$acc loop gang reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+end subroutine s1
+
+subroutine s2
+  implicit none
+  !$acc routine worker
+
+  integer, parameter :: n = 100
+  integer :: i, j, sum
+  sum = 0
+
+  !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+  do i = 1, n
+     sum = sum + 1
+  end do
+
+  !$acc loop reduction(+:sum)
+  do i = 1, n
+     !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+     do j = 1, n
+        sum = sum + 1
+     end do
+  end do
+end subroutine s2
+
+integer function f1 ()
+  implicit none
+
+  integer, parameter :: n = 100
+  integer :: i, sum
+  sum = 0
+
+  !$acc parallel reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel loop gang reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+
+  !$acc parallel
+  !$acc loop gang reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+
+  f1 = sum
+end function f1
+
+integer function f2 ()
+  implicit none
+  !$acc routine worker
+
+  integer, parameter :: n = 100
+  integer :: i, j, sum
+  sum = 0
+
+  !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+  do i = 1, n
+     sum = sum + 1
+  end do
+
+  !$acc loop reduction(+:sum)
+  do i = 1, n
+     !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+     do j = 1, n
+        sum = sum + 1
+     end do
+  end do
+
+  f2 = sum
+end function f2
+
+module m
+contains
+  subroutine s3
+    implicit none
+
+    integer, parameter :: n = 100
+    integer :: i, sum
+    sum = 0
+
+    !$acc parallel reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+    !$acc end parallel
+
+    !$acc parallel loop gang reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+
+    !$acc parallel
+    !$acc loop gang reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+    !$acc end parallel
+  end subroutine s3
+
+  subroutine s4
+    implicit none
+    !$acc routine worker
+
+    integer, parameter :: n = 100
+    integer :: i, j, sum
+    sum = 0
+
+    !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+    do i = 1, n
+       sum = sum + 1
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, n
+       !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+       do j = 1, n
+          sum = sum + 1
+       end do
+    end do
+  end subroutine s4
+
+  integer function f3 ()
+    implicit none
+
+    integer, parameter :: n = 100
+    integer :: i, sum
+    sum = 0
+
+    !$acc parallel reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+    !$acc end parallel
+
+    !$acc parallel loop gang reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+
+    !$acc parallel
+    !$acc loop gang reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+    !$acc end parallel
+
+    f3 = sum
+  end function f3
+
+  integer function f4 ()
+    implicit none
+    !$acc routine worker
+
+    integer, parameter :: n = 100
+    integer :: i, j, sum
+    sum = 0
+
+    !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+    do i = 1, n
+       sum = sum + 1
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, n
+       !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+       do j = 1, n
+          sum = sum + 1
+       end do
+    end do
+
+    f4 = sum
+  end function f4
+end module m
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c
new file mode 100644
index 00000000000..856ef0e0d89
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c
@@ -0,0 +1,29 @@
+/* Check a parallel reduction which is are explicitly initialized by
+   the user.  */
+
+#include <assert.h>
+
+int
+main ()
+{
+  int n = 10;
+  float accel = 1.0, host = 1.0;
+  int i;
+
+#pragma acc parallel copyin(n) reduction(*:accel)
+  {
+    accel = 1.0;
+#pragma acc loop gang reduction(*:accel)
+    for( i = 1; i <= n; i++)
+      {
+	accel *= 2.0;
+      }
+  }
+
+  for (i = 1; i <= n; i++)
+    host *= 2.0;
+
+  assert (accel == host);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c
new file mode 100644
index 00000000000..350174a1031
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c
@@ -0,0 +1,32 @@
+#include <complex.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+typedef float _Complex Type;
+
+#define N 32
+
+int
+main (void)
+{
+  Type ary[N];
+
+  for (int ix = 0; ix < N;  ix++)
+    ary[ix] = 1.0 + 1.0j;
+
+  Type tprod = 1.0;
+
+#pragma acc parallel vector_length(32)
+  {
+#pragma acc loop vector reduction (*:tprod)
+    for (int ix = 0; ix < N; ix++)
+      tprod *= ary[ix];
+  }
+
+  Type expected = 65536.0;
+
+  if (tprod != expected)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90
new file mode 100644
index 00000000000..fd64d88def4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90
@@ -0,0 +1,54 @@
+! Test gang reductions on dummy variables.
+
+! { dg-do run }
+
+program main
+  implicit none
+
+  integer g, w, v, c
+
+  g = 0
+  w = 0
+  v = 0
+  c = 0
+
+  call reduction (g, w, v, c)
+
+  if (g /= 10) call abort
+  if (w /= 10) call abort
+  if (v /= 10) call abort
+  if (c /= 100) call abort
+end program main
+
+subroutine reduction (g, w, v, c)
+  implicit none
+
+  integer g, w, v, c, i
+
+  !$acc parallel
+  !$acc loop reduction(+:g) gang
+  do i = 1, 10
+     g = g + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction(+:w) worker
+  do i = 1, 10
+     w = w + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction(+:v) vector
+  do i = 1, 10
+     v = v + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel loop reduction(+:c) gang worker vector
+  do i = 1, 100
+     c = c + 1
+  end do
+  !$acc end parallel loop
+end subroutine reduction
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes
  2018-06-29 18:22 ` [patch] various OpenACC reduction enhancements - ME and nvptx changes Cesar Philippidis
@ 2018-10-05 14:09   ` Tom de Vries
  2018-10-30 20:09     ` Cesar Philippidis
  2018-12-04 12:29   ` Jakub Jelinek
  1 sibling, 1 reply; 14+ messages in thread
From: Tom de Vries @ 2018-10-05 14:09 UTC (permalink / raw)
  To: Cesar Philippidis, gcc-patches, Jakub Jelinek

On 6/29/18 8:19 PM, Cesar Philippidis wrote:
> The attached patch includes the nvptx and GCC ME reductions enhancements.
> 
> Is this patch OK for trunk? It bootstrapped / regression tested cleanly
> for x86_64 with nvptx offloading.
> 

These need fixing:
...
=== ERROR type #5: trailing whitespace (4 error(s)) ===
gcc/config/nvptx/nvptx.c:5139:0:██
gcc/config/nvptx/nvptx.c:5660:8:      doâ–ˆ
gcc/config/nvptx/nvptx.c:5702:0:██
gcc/config/nvptx/nvptx.c:5726:0:██
...


> 	gcc/
> 	* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
> 	(nvptx_split_blocks): Call it for cond_uni insn.
> 	(nvptx_expand_cond_uni): New.
> 	(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
> 	(nvptx_init_builtins): Initialize it.
> 	(nvptx_expand_builtin):
> 	(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
> 	tree BITS operand.
> 	(nvptx_vector_reduction): New.
> 	(nvptx_adjust_reduction_type): New.
> 	(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
> 	(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
> 	(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
> 	Use it to adjust the type of ref_to_res.
> 	(nvptx_goacc_reduction_teardown):
> 	* config/nvptx/nvptx.md (cond_uni): New pattern.

> diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
> index 5608bee8a8d..33ec3db1153 100644
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c
> @@ -2863,6 +2863,52 @@ nvptx_reorg_uniform_simt ()
>      }
>  }
>  
> +/* UNIFIED is a cond_uni insn.  Find the branch insn it affects, and
> +   mark that as unified.  We expect to be in a single block.  */
> +
> +static void
> +nvptx_propagate_unified (rtx_insn *unified)
> +{
> +  rtx_insn *probe = unified;
> +  rtx cond_reg = SET_DEST (PATTERN (unified));
> +  rtx pat = NULL_RTX;
> +
> +  /* Find the comparison.  (We could skip this and simply scan to he
> +     blocks' terminating branch, if we didn't care for self
> +     checking.)  */
> +  for (;;)
> +    {
> +      probe = next_real_insn (probe);
> +      if (!probe)
> +	break;
> +      pat = PATTERN (probe);
> +
> +      if (GET_CODE (pat) == SET
> +	  && GET_RTX_CLASS (GET_CODE (SET_SRC (pat))) == RTX_COMPARE
> +	  && XEXP (SET_SRC (pat), 0) == cond_reg)
> +	break;
> +      gcc_assert (NONJUMP_INSN_P (probe));
> +    }
> +  gcc_assert (pat);
> +  rtx pred_reg = SET_DEST (pat);
> +
> +  /* Find the branch.  */
> +  do
> +    probe = NEXT_INSN (probe);
> +  while (!JUMP_P (probe));
> +
> +  pat = PATTERN (probe);
> +  rtx itec = XEXP (SET_SRC (pat), 0);
> +  gcc_assert (XEXP (itec, 0) == pred_reg);
> +
> +  /* Mark the branch's condition as unified.  */
> +  rtx unspec = gen_rtx_UNSPEC (BImode, gen_rtvec (1, pred_reg),
> +			       UNSPEC_BR_UNIFIED);
> +  bool ok = validate_change (probe, &XEXP (itec, 0), unspec, false);
> +
> +  gcc_assert (ok);
> +}
> +
>  /* Loop structure of the function.  The entire function is described as
>     a NULL loop.  */
>  
> @@ -2964,6 +3010,9 @@ nvptx_split_blocks (bb_insn_map_t *map)
>  	    continue;
>  	  switch (recog_memoized (insn))
>  	    {
> +	    case CODE_FOR_cond_uni:
> +	      nvptx_propagate_unified (insn);
> +	      /* FALLTHROUGH */
>  	    default:
>  	      seen_insn = true;
>  	      continue;
> @@ -5080,6 +5129,21 @@ nvptx_expand_cmp_swap (tree exp, rtx target,
>    return target;
>  }
>  
> +/* Expander for the compare unified builtin.  */
> +
> +static rtx
> +nvptx_expand_cond_uni (tree exp, rtx target, machine_mode mode, int ignore)
> +{
> +  if (ignore)
> +    return target;
> +  
> +  rtx src = expand_expr (CALL_EXPR_ARG (exp, 0),
> +			 NULL_RTX, mode, EXPAND_NORMAL);
> +
> +  emit_insn (gen_cond_uni (target, src));
> +
> +  return target;
> +}
>  
>  /* Codes for all the NVPTX builtins.  */
>  enum nvptx_builtins
> @@ -5089,6 +5153,7 @@ enum nvptx_builtins
>    NVPTX_BUILTIN_WORKER_ADDR,
>    NVPTX_BUILTIN_CMP_SWAP,
>    NVPTX_BUILTIN_CMP_SWAPLL,
> +  NVPTX_BUILTIN_COND_UNI,
>    NVPTX_BUILTIN_MAX
>  };
>  
> @@ -5126,6 +5191,7 @@ nvptx_init_builtins (void)
>         (PTRVOID, ST, UINT, UINT, NULL_TREE));
>    DEF (CMP_SWAP, "cmp_swap", (UINT, PTRVOID, UINT, UINT, NULL_TREE));
>    DEF (CMP_SWAPLL, "cmp_swapll", (LLUINT, PTRVOID, LLUINT, LLUINT, NULL_TREE));
> +  DEF (COND_UNI, "cond_uni", (integer_type_node, integer_type_node, NULL_TREE));
>  
>  #undef DEF
>  #undef ST
> @@ -5158,6 +5224,9 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
>      case NVPTX_BUILTIN_CMP_SWAPLL:
>        return nvptx_expand_cmp_swap (exp, target, mode, ignore);
>  
> +    case NVPTX_BUILTIN_COND_UNI:
> +      return nvptx_expand_cond_uni (exp, target, mode, ignore);
> +
>      default: gcc_unreachable ();
>      }
>  }
> @@ -5284,7 +5353,7 @@ nvptx_get_worker_red_addr (tree type, tree offset)
>  
>  static void
>  nvptx_generate_vector_shuffle (location_t loc,
> -			       tree dest_var, tree var, unsigned shift,
> +			       tree dest_var, tree var, tree bits,
>  			       gimple_seq *seq)
>  {
>    unsigned fn = NVPTX_BUILTIN_SHUFFLE;
> @@ -5307,7 +5376,6 @@ nvptx_generate_vector_shuffle (location_t loc,
>      }
>    
>    tree call = nvptx_builtin_decl (fn, true);
> -  tree bits = build_int_cst (unsigned_type_node, shift);
>    tree kind = build_int_cst (unsigned_type_node, SHUFFLE_DOWN);
>    tree expr;
>  
> @@ -5583,6 +5651,126 @@ nvptx_reduction_update (location_t loc, gimple_stmt_iterator *gsi,
>      return nvptx_lockfull_update (loc, gsi, ptr, var, op);
>  }
>  
> +/* Emit a vector-level reduction loop.  OLD_VAR is the incoming
> +   variable to reduce (valid in each vector), OP is the reduction
> +   operator.  Return the reduced value (an SSA var).
> +
> +   The code we generate looks like:
> +      unsigned old_shift = DIM_SIZE(VECTOR);
> +      do 
> +	{
> +	  shift = PHI (old_shift, new_shift);
> +	  var = PHI (old_var, new_var);
> +	  new_shift = shift >> 1;
> +	  other_var = VSHUFFLE (var, new_shift);
> +	  new_var = var OP other_var;
> +	  cond_var = builtin_cond_uni (new_shift);
> +	}
> +	while (cond_var > 1);
> +
> +  The builtin_cond_ini expands to a cond_uni instruction, which is
> +  processed in nvpts_split_blocks to mark the loop's terminating
> +  branch instruction.  */
> +
> +static tree
> +nvptx_vector_reduction (location_t loc, gimple_stmt_iterator *gsi,
> +			tree old_var, tree_code op)
> +{
> +  tree var_type = TREE_TYPE (old_var);
> +
> +  /*  Emit old_shift = DIM_SIZE(VECTOR) */
> +  tree old_shift = make_ssa_name (integer_type_node);
> +  tree dim = build_int_cst (integer_type_node, GOMP_DIM_VECTOR);
> +  gcall *call = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, dim);
> +  gimple_set_lhs (call, old_shift);
> +  gimple_set_location (call, loc);
> +  gsi_insert_before (gsi, call, GSI_SAME_STMT);
> +
> +  /* Split the block just after the init stmts.  */
> +  basic_block pre_bb = gsi_bb (*gsi);
> +  edge pre_edge = split_block (pre_bb, call);
> +  basic_block loop_bb = pre_edge->dest;
> +  pre_bb = pre_edge->src;
> +  /* Reset the iterator.  */
> +  *gsi = gsi_for_stmt (gsi_stmt (*gsi));
> +
> +  tree shift = make_ssa_name (integer_type_node);
> +  tree new_shift = make_ssa_name (integer_type_node);
> +  tree var = make_ssa_name (var_type);
> +  tree other_var = make_ssa_name (var_type);
> +  tree new_var = make_ssa_name (var_type);
> +  
> +  /* Build and insert the loop body.  */
> +  gimple_seq loop_seq = NULL;
> +
> +  /* new_shift = shift >> 1 */
> +  tree shift_expr = fold_build2 (RSHIFT_EXPR, integer_type_node,
> +				 shift, integer_one_node);
> +  gimplify_assign (new_shift, shift_expr, &loop_seq);
> +
> +  /* other_var = shuffle (var, shift) */
> +  nvptx_generate_vector_shuffle (loc, other_var, var, new_shift, &loop_seq);
> +  /* new_var = var  OP other_var */
> +  tree red_expr = fold_build2 (op, var_type, var, other_var);
> +  gimplify_assign (new_var, red_expr, &loop_seq);
> +
> +  /* Mark the iterator variable as unified.  */
> +  tree cond_var = make_ssa_name (integer_type_node);
> +  tree uni_fn = nvptx_builtin_decl (NVPTX_BUILTIN_COND_UNI, true);
> +  tree uni_expr = build_call_expr_loc (loc, uni_fn, 1, new_shift);
> +  gimplify_assign (cond_var,  uni_expr, &loop_seq);
> +
> +  gcond *cond = gimple_build_cond (LE_EXPR, cond_var, integer_one_node,
> +				   NULL_TREE, NULL_TREE);
> +  gimple_seq_add_stmt (&loop_seq, cond);
> +  
> +  gsi_insert_seq_before (gsi, loop_seq, GSI_SAME_STMT);
> +
> +  /* Split the block just after the loop stmts.  */
> +  edge post_edge = split_block (loop_bb, cond);
> +  basic_block post_bb = post_edge->dest;
> +  loop_bb = post_edge->src;
> +  *gsi = gsi_for_stmt (gsi_stmt (*gsi));
> +
> +  /* Create the loop.  */
> +  post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;

Edges need probabilities, as in nvptx_lockless_update,
nvptx_lockfull_update and nvptx_goacc_reduction_init.

> +  edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_FALSE_VALUE);
> +  set_immediate_dominator (CDI_DOMINATORS, loop_bb, pre_bb);
> +  set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
> +
> +  gphi *shift_phi = create_phi_node (shift, loop_bb);
> +  add_phi_arg (shift_phi, old_shift, pre_edge, loc);
> +  add_phi_arg (shift_phi, new_shift, loop_edge, loc);
> +
> +  gphi *var_phi = create_phi_node (var, loop_bb);
> +  add_phi_arg (var_phi, old_var, pre_edge, loc);
> +  add_phi_arg (var_phi, new_var, loop_edge, loc);
> +
> +  loop *loop = alloc_loop ();
> +  loop->header = loop_bb;
> +  loop->latch = loop_bb;
> +  add_loop (loop, loop_bb->loop_father);
> +
> +  return new_var;
> +}
> +
> +/* Dummy reduction vars that have GOMP_MAP_FIRSTPRIVATE_POINTER data
> +   mappings gets retyped to (void *).  Adjust the type of VAR to TYPE
> +   as appropriate.  */
> +
> +static tree
> +nvptx_adjust_reduction_type (tree var, tree type, gimple_seq *seq)
> +{
> +  if (TREE_TYPE (TREE_TYPE (var)) == type)
> +    return var;
> +
> +  tree ptype = build_pointer_type (type);
> +  tree t = make_ssa_name (ptype);
> +  tree expr = fold_build1 (NOP_EXPR, ptype, var);
> +  gimple_seq_add_stmt (seq, gimple_build_assign (t, expr));
> +  return t;
> +}
> +
>  /* NVPTX implementation of GOACC_REDUCTION_SETUP.  */
>  
>  static void
> @@ -5602,7 +5790,11 @@ nvptx_goacc_reduction_setup (gcall *call)
>        tree ref_to_res = gimple_call_arg (call, 1);
>  
>        if (!integer_zerop (ref_to_res))
> -	var = build_simple_mem_ref (ref_to_res);
> +	{
> +	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
> +						    &seq);
> +	  var = build_simple_mem_ref (ref_to_res);
> +	}
>      }
>    
>    if (level == GOMP_DIM_WORKER)
> @@ -5702,7 +5894,11 @@ nvptx_goacc_reduction_init (gcall *call)
>  	    init = var;
>  	}
>  
> -      gimplify_assign (lhs, init, &seq);
> +      /* The LHS may be NULL if a reduction variable on a parallel
> +	 construct is initialized to some constant inside the parallel
> +	 region.  */
> +      if (lhs)
> +	gimplify_assign (lhs, init, &seq);
>      }
>  
>    pop_gimplify_context (NULL);
> @@ -5727,22 +5923,7 @@ nvptx_goacc_reduction_fini (gcall *call)
>    push_gimplify_context (true);
>  
>    if (level == GOMP_DIM_VECTOR)
> -    {
> -      /* Emit binary shuffle tree.  TODO. Emit this as an actual loop,
> -	 but that requires a method of emitting a unified jump at the
> -	 gimple level.  */
> -      for (int shfl = PTX_VECTOR_LENGTH / 2; shfl > 0; shfl = shfl >> 1)
> -	{
> -	  tree other_var = make_ssa_name (TREE_TYPE (var));
> -	  nvptx_generate_vector_shuffle (gimple_location (call),
> -					 other_var, var, shfl, &seq);
> -
> -	  r = make_ssa_name (TREE_TYPE (var));
> -	  gimplify_assign (r, fold_build2 (op, TREE_TYPE (var),
> -					   var, other_var), &seq);
> -	  var = r;
> -	}
> -    }
> +    r = nvptx_vector_reduction (gimple_location (call), &gsi, var, op);
>    else
>      {
>        tree accum = NULL_TREE;
> @@ -5760,7 +5941,11 @@ nvptx_goacc_reduction_fini (gcall *call)
>        else if (integer_zerop (ref_to_res))
>  	r = var;
>        else
> -	accum = ref_to_res;
> +	{
> +	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
> +						    &seq);
> +	  accum = ref_to_res;
> +	}
>  
>        if (accum)
>  	{
> @@ -5809,7 +5994,11 @@ nvptx_goacc_reduction_teardown (gcall *call)
>        tree ref_to_res = gimple_call_arg (call, 1);
>  
>        if (!integer_zerop (ref_to_res))
> -	gimplify_assign (build_simple_mem_ref (ref_to_res), var, &seq);
> +	{
> +	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
> +						    &seq);
> +	  gimplify_assign (build_simple_mem_ref (ref_to_res), var, &seq);
> +	}
>      }
>  
>    if (lhs)
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 2988f5dfa91..79c4c061841 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -600,6 +600,13 @@
>    "%J0\\tbra.uni\\t%l1;"
>    [(set_attr "predicable" "false")])
>  
> +(define_insn "cond_uni"
> +  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> +          (unspec:SI [(match_operand:SI 1 "nvptx_nonmemory_operand" "R")]
> +  		     UNSPEC_BR_UNIFIED))]
> +  ""
> +  "%.\\tmov%t0\\t%0, %1; // unified")
> +
>  (define_expand "cbranch<mode>4"
>    [(set (pc)
>  	(if_then_else (match_operator 0 "nvptx_comparison_operator"

Otherwise, nvptx part LGTM.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes
  2018-10-05 14:09   ` Tom de Vries
@ 2018-10-30 20:09     ` Cesar Philippidis
  0 siblings, 0 replies; 14+ messages in thread
From: Cesar Philippidis @ 2018-10-30 20:09 UTC (permalink / raw)
  To: Tom de Vries, gcc-patches, Jakub Jelinek, Schwinge, Thomas,
	Julian Brown, cesar

[-- Attachment #1: Type: text/plain, Size: 770 bytes --]

On 10/5/18 07:07, Tom de Vries wrote:
> On 6/29/18 8:19 PM, Cesar Philippidis wrote:
>> The attached patch includes the nvptx and GCC ME reductions enhancements.
>>
>> Is this patch OK for trunk? It bootstrapped / regression tested cleanly
>> for x86_64 with nvptx offloading.
>>
> 
> These need fixing:
> ...
> === ERROR type #5: trailing whitespace (4 error(s)) ===
> gcc/config/nvptx/nvptx.c:5139:0:██
> gcc/config/nvptx/nvptx.c:5660:8:      doâ–ˆ
> gcc/config/nvptx/nvptx.c:5702:0:██
> gcc/config/nvptx/nvptx.c:5726:0:██
> ...

Sorry. The attached patch fixes that.

> Otherwise, nvptx part LGTM.
Tomorrow's my last day at Mentor, so either Thomas or Julian will need
to commit it once the other patches get approved.

Thanks,
Cesar

[-- Attachment #2: trunk-nvptx-reductions.diff --]
[-- Type: text/x-patch, Size: 11952 bytes --]

	gcc/
	* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
	(nvptx_split_blocks): Call it for cond_uni insn.
	(nvptx_expand_cond_uni): New.
	(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
	(nvptx_init_builtins): Initialize it.
	(nvptx_expand_builtin):
	(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
	tree BITS operand.
	(nvptx_vector_reduction): New.
	(nvptx_adjust_reduction_type): New.
	(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
	(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
	Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_teardown):
	* config/nvptx/nvptx.md (cond_uni): New pattern.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9903a273863..acb490a9a90 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2863,6 +2863,52 @@ nvptx_reorg_uniform_simt ()
     }
 }
 
+/* UNIFIED is a cond_uni insn.  Find the branch insn it affects, and
+   mark that as unified.  We expect to be in a single block.  */
+
+static void
+nvptx_propagate_unified (rtx_insn *unified)
+{
+  rtx_insn *probe = unified;
+  rtx cond_reg = SET_DEST (PATTERN (unified));
+  rtx pat = NULL_RTX;
+
+  /* Find the comparison.  (We could skip this and simply scan to he
+     blocks' terminating branch, if we didn't care for self
+     checking.)  */
+  for (;;)
+    {
+      probe = next_real_insn (probe);
+      if (!probe)
+	break;
+      pat = PATTERN (probe);
+
+      if (GET_CODE (pat) == SET
+	  && GET_RTX_CLASS (GET_CODE (SET_SRC (pat))) == RTX_COMPARE
+	  && XEXP (SET_SRC (pat), 0) == cond_reg)
+	break;
+      gcc_assert (NONJUMP_INSN_P (probe));
+    }
+  gcc_assert (pat);
+  rtx pred_reg = SET_DEST (pat);
+
+  /* Find the branch.  */
+  do
+    probe = NEXT_INSN (probe);
+  while (!JUMP_P (probe));
+
+  pat = PATTERN (probe);
+  rtx itec = XEXP (SET_SRC (pat), 0);
+  gcc_assert (XEXP (itec, 0) == pred_reg);
+
+  /* Mark the branch's condition as unified.  */
+  rtx unspec = gen_rtx_UNSPEC (BImode, gen_rtvec (1, pred_reg),
+			       UNSPEC_BR_UNIFIED);
+  bool ok = validate_change (probe, &XEXP (itec, 0), unspec, false);
+
+  gcc_assert (ok);
+}
+
 /* Loop structure of the function.  The entire function is described as
    a NULL loop.  */
 
@@ -2964,6 +3010,9 @@ nvptx_split_blocks (bb_insn_map_t *map)
 	    continue;
 	  switch (recog_memoized (insn))
 	    {
+	    case CODE_FOR_cond_uni:
+	      nvptx_propagate_unified (insn);
+	      /* FALLTHROUGH */
 	    default:
 	      seen_insn = true;
 	      continue;
@@ -5083,6 +5132,21 @@ nvptx_expand_cmp_swap (tree exp, rtx target,
   return target;
 }
 
+/* Expander for the compare unified builtin.  */
+
+static rtx
+nvptx_expand_cond_uni (tree exp, rtx target, machine_mode mode, int ignore)
+{
+  if (ignore)
+    return target;
+
+  rtx src = expand_expr (CALL_EXPR_ARG (exp, 0),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+
+  emit_insn (gen_cond_uni (target, src));
+
+  return target;
+}
 
 /* Codes for all the NVPTX builtins.  */
 enum nvptx_builtins
@@ -5092,6 +5156,7 @@ enum nvptx_builtins
   NVPTX_BUILTIN_WORKER_ADDR,
   NVPTX_BUILTIN_CMP_SWAP,
   NVPTX_BUILTIN_CMP_SWAPLL,
+  NVPTX_BUILTIN_COND_UNI,
   NVPTX_BUILTIN_MAX
 };
 
@@ -5129,6 +5194,7 @@ nvptx_init_builtins (void)
        (PTRVOID, ST, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAP, "cmp_swap", (UINT, PTRVOID, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAPLL, "cmp_swapll", (LLUINT, PTRVOID, LLUINT, LLUINT, NULL_TREE));
+  DEF (COND_UNI, "cond_uni", (integer_type_node, integer_type_node, NULL_TREE));
 
 #undef DEF
 #undef ST
@@ -5161,6 +5227,9 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
     case NVPTX_BUILTIN_CMP_SWAPLL:
       return nvptx_expand_cmp_swap (exp, target, mode, ignore);
 
+    case NVPTX_BUILTIN_COND_UNI:
+      return nvptx_expand_cond_uni (exp, target, mode, ignore);
+
     default: gcc_unreachable ();
     }
 }
@@ -5284,7 +5353,7 @@ nvptx_get_worker_red_addr (tree type, tree offset)
 
 static void
 nvptx_generate_vector_shuffle (location_t loc,
-			       tree dest_var, tree var, unsigned shift,
+			       tree dest_var, tree var, tree bits,
 			       gimple_seq *seq)
 {
   unsigned fn = NVPTX_BUILTIN_SHUFFLE;
@@ -5307,7 +5376,6 @@ nvptx_generate_vector_shuffle (location_t loc,
     }
   
   tree call = nvptx_builtin_decl (fn, true);
-  tree bits = build_int_cst (unsigned_type_node, shift);
   tree kind = build_int_cst (unsigned_type_node, SHUFFLE_DOWN);
   tree expr;
 
@@ -5583,6 +5651,126 @@ nvptx_reduction_update (location_t loc, gimple_stmt_iterator *gsi,
     return nvptx_lockfull_update (loc, gsi, ptr, var, op);
 }
 
+/* Emit a vector-level reduction loop.  OLD_VAR is the incoming
+   variable to reduce (valid in each vector), OP is the reduction
+   operator.  Return the reduced value (an SSA var).
+
+   The code we generate looks like:
+      unsigned old_shift = DIM_SIZE(VECTOR);
+      do
+	{
+	  shift = PHI (old_shift, new_shift);
+	  var = PHI (old_var, new_var);
+	  new_shift = shift >> 1;
+	  other_var = VSHUFFLE (var, new_shift);
+	  new_var = var OP other_var;
+	  cond_var = builtin_cond_uni (new_shift);
+	}
+	while (cond_var > 1);
+
+  The builtin_cond_ini expands to a cond_uni instruction, which is
+  processed in nvpts_split_blocks to mark the loop's terminating
+  branch instruction.  */
+
+static tree
+nvptx_vector_reduction (location_t loc, gimple_stmt_iterator *gsi,
+			tree old_var, tree_code op)
+{
+  tree var_type = TREE_TYPE (old_var);
+
+  /*  Emit old_shift = DIM_SIZE(VECTOR) */
+  tree old_shift = make_ssa_name (integer_type_node);
+  tree dim = build_int_cst (integer_type_node, GOMP_DIM_VECTOR);
+  gcall *call = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, dim);
+  gimple_set_lhs (call, old_shift);
+  gimple_set_location (call, loc);
+  gsi_insert_before (gsi, call, GSI_SAME_STMT);
+
+  /* Split the block just after the init stmts.  */
+  basic_block pre_bb = gsi_bb (*gsi);
+  edge pre_edge = split_block (pre_bb, call);
+  basic_block loop_bb = pre_edge->dest;
+  pre_bb = pre_edge->src;
+  /* Reset the iterator.  */
+  *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+  tree shift = make_ssa_name (integer_type_node);
+  tree new_shift = make_ssa_name (integer_type_node);
+  tree var = make_ssa_name (var_type);
+  tree other_var = make_ssa_name (var_type);
+  tree new_var = make_ssa_name (var_type);
+
+  /* Build and insert the loop body.  */
+  gimple_seq loop_seq = NULL;
+
+  /* new_shift = shift >> 1 */
+  tree shift_expr = fold_build2 (RSHIFT_EXPR, integer_type_node,
+				 shift, integer_one_node);
+  gimplify_assign (new_shift, shift_expr, &loop_seq);
+
+  /* other_var = shuffle (var, shift) */
+  nvptx_generate_vector_shuffle (loc, other_var, var, new_shift, &loop_seq);
+  /* new_var = var  OP other_var */
+  tree red_expr = fold_build2 (op, var_type, var, other_var);
+  gimplify_assign (new_var, red_expr, &loop_seq);
+
+  /* Mark the iterator variable as unified.  */
+  tree cond_var = make_ssa_name (integer_type_node);
+  tree uni_fn = nvptx_builtin_decl (NVPTX_BUILTIN_COND_UNI, true);
+  tree uni_expr = build_call_expr_loc (loc, uni_fn, 1, new_shift);
+  gimplify_assign (cond_var,  uni_expr, &loop_seq);
+
+  gcond *cond = gimple_build_cond (LE_EXPR, cond_var, integer_one_node,
+				   NULL_TREE, NULL_TREE);
+  gimple_seq_add_stmt (&loop_seq, cond);
+
+  gsi_insert_seq_before (gsi, loop_seq, GSI_SAME_STMT);
+
+  /* Split the block just after the loop stmts.  */
+  edge post_edge = split_block (loop_bb, cond);
+  basic_block post_bb = post_edge->dest;
+  loop_bb = post_edge->src;
+  *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+  /* Create the loop.  */
+  post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;
+  edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_FALSE_VALUE);
+  set_immediate_dominator (CDI_DOMINATORS, loop_bb, pre_bb);
+  set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
+
+  gphi *shift_phi = create_phi_node (shift, loop_bb);
+  add_phi_arg (shift_phi, old_shift, pre_edge, loc);
+  add_phi_arg (shift_phi, new_shift, loop_edge, loc);
+
+  gphi *var_phi = create_phi_node (var, loop_bb);
+  add_phi_arg (var_phi, old_var, pre_edge, loc);
+  add_phi_arg (var_phi, new_var, loop_edge, loc);
+
+  loop *loop = alloc_loop ();
+  loop->header = loop_bb;
+  loop->latch = loop_bb;
+  add_loop (loop, loop_bb->loop_father);
+
+  return new_var;
+}
+
+/* Dummy reduction vars that have GOMP_MAP_FIRSTPRIVATE_POINTER data
+   mappings gets retyped to (void *).  Adjust the type of VAR to TYPE
+   as appropriate.  */
+
+static tree
+nvptx_adjust_reduction_type (tree var, tree type, gimple_seq *seq)
+{
+  if (TREE_TYPE (TREE_TYPE (var)) == type)
+    return var;
+
+  tree ptype = build_pointer_type (type);
+  tree t = make_ssa_name (ptype);
+  tree expr = fold_build1 (NOP_EXPR, ptype, var);
+  gimple_seq_add_stmt (seq, gimple_build_assign (t, expr));
+  return t;
+}
+
 /* NVPTX implementation of GOACC_REDUCTION_SETUP.  */
 
 static void
@@ -5602,7 +5790,11 @@ nvptx_goacc_reduction_setup (gcall *call)
       tree ref_to_res = gimple_call_arg (call, 1);
 
       if (!integer_zerop (ref_to_res))
-	var = build_simple_mem_ref (ref_to_res);
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  var = build_simple_mem_ref (ref_to_res);
+	}
     }
   
   if (level == GOMP_DIM_WORKER)
@@ -5702,7 +5894,11 @@ nvptx_goacc_reduction_init (gcall *call)
 	    init = var;
 	}
 
-      gimplify_assign (lhs, init, &seq);
+      /* The LHS may be NULL if a reduction variable on a parallel
+	 construct is initialized to some constant inside the parallel
+	 region.  */
+      if (lhs)
+	gimplify_assign (lhs, init, &seq);
     }
 
   pop_gimplify_context (NULL);
@@ -5727,22 +5923,7 @@ nvptx_goacc_reduction_fini (gcall *call)
   push_gimplify_context (true);
 
   if (level == GOMP_DIM_VECTOR)
-    {
-      /* Emit binary shuffle tree.  TODO. Emit this as an actual loop,
-	 but that requires a method of emitting a unified jump at the
-	 gimple level.  */
-      for (int shfl = PTX_VECTOR_LENGTH / 2; shfl > 0; shfl = shfl >> 1)
-	{
-	  tree other_var = make_ssa_name (TREE_TYPE (var));
-	  nvptx_generate_vector_shuffle (gimple_location (call),
-					 other_var, var, shfl, &seq);
-
-	  r = make_ssa_name (TREE_TYPE (var));
-	  gimplify_assign (r, fold_build2 (op, TREE_TYPE (var),
-					   var, other_var), &seq);
-	  var = r;
-	}
-    }
+    r = nvptx_vector_reduction (gimple_location (call), &gsi, var, op);
   else
     {
       tree accum = NULL_TREE;
@@ -5760,7 +5941,11 @@ nvptx_goacc_reduction_fini (gcall *call)
       else if (integer_zerop (ref_to_res))
 	r = var;
       else
-	accum = ref_to_res;
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  accum = ref_to_res;
+	}
 
       if (accum)
 	{
@@ -5809,7 +5994,11 @@ nvptx_goacc_reduction_teardown (gcall *call)
       tree ref_to_res = gimple_call_arg (call, 1);
 
       if (!integer_zerop (ref_to_res))
-	gimplify_assign (build_simple_mem_ref (ref_to_res), var, &seq);
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  gimplify_assign (build_simple_mem_ref (ref_to_res), var, &seq);
+	}
     }
 
   if (lhs)
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index ca00b1d8073..4f7d8ea0e68 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -600,6 +600,13 @@
   "%J0\\tbra.uni\\t%l1;"
   [(set_attr "predicable" "false")])
 
+(define_insn "cond_uni"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+          (unspec:SI [(match_operand:SI 1 "nvptx_nonmemory_operand" "R")]
+		     UNSPEC_BR_UNIFIED))]
+  ""
+  "%.\\tmov%t0\\t%0, %1; // unified")
+
 (define_expand "cbranch<mode>4"
   [(set (pc)
 	(if_then_else (match_operator 0 "nvptx_comparison_operator"

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes
  2018-06-29 18:22 ` [patch] various OpenACC reduction enhancements - ME and nvptx changes Cesar Philippidis
  2018-10-05 14:09   ` Tom de Vries
@ 2018-12-04 12:29   ` Jakub Jelinek
  2018-12-04 15:54     ` Tom de Vries
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 2018-12-04 12:29 UTC (permalink / raw)
  To: Cesar Philippidis, Thomas Schwinge, Tom de Vries; +Cc: gcc-patches

On Fri, Jun 29, 2018 at 11:19:53AM -0700, Cesar Philippidis wrote:
> The attached patch includes the nvptx and GCC ME reductions enhancements.
> 
> Is this patch OK for trunk? It bootstrapped / regression tested cleanly
> for x86_64 with nvptx offloading.

This is all OpenACC specific code not really shareable with OpenMP, if
Thomas (for middle-end) and Tom (for NVPTX backend) are ok with it, it is ok
for trunk.

> 2018-06-29  Cesar Philippidis  <cesar@codesourcery.com>
> 	    Nathan Sidwell  <nathan@acm.org>
> 
> 	gcc/
> 	* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
> 	(nvptx_split_blocks): Call it for cond_uni insn.
> 	(nvptx_expand_cond_uni): New.
> 	(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
> 	(nvptx_init_builtins): Initialize it.
> 	(nvptx_expand_builtin):
> 	(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
> 	tree BITS operand.
> 	(nvptx_vector_reduction): New.
> 	(nvptx_adjust_reduction_type): New.
> 	(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
> 	(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
> 	(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
> 	Use it to adjust the type of ref_to_res.
> 	(nvptx_goacc_reduction_teardown):
> 	* config/nvptx/nvptx.md (cond_uni): New pattern.
> 	* omp-general.h (enum oacc_loop_flags): Add OLF_REDUCTION enum.
> 	* omp-low.c (lower_oacc_reductions): Handle reduction decls mapped
> 	with GOMP_MAP_FIRSTPRIVATE_POINTER.
> 	(lower_oacc_head_mark): Use OLF_REDUCTION to mark OpenACC reductions.
> 	* omp-offload.c (oacc_loop_auto_partitions): Don't assign gang
> 	level parallelism to orphan reductions.
> 	(default_goacc_reduction): Retype ref_to_res as necessary.

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - FE changes
  2018-06-29 18:23 ` [patch] various OpenACC reduction enhancements - FE changes Cesar Philippidis
@ 2018-12-04 12:57   ` Jakub Jelinek
  2018-12-13 14:12     ` Julian Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 2018-12-04 12:57 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Tom de Vries, Fortran List

On Fri, Jun 29, 2018 at 11:22:00AM -0700, Cesar Philippidis wrote:
> 2018-06-29  Cesar Philippidis  <cesar@codesourcery.com>
> 	    Nathan Sidwell  <nathan@acm.org>
> 
> 	gcc/c/
> 	* c-parser.c (c_parser_omp_variable_list): New c_omp_region_type
> 	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
> 	OpenACC.
> 	(c_parser_omp_clause_reduction): Update call to
> 	c_parser_omp_variable_list.  Propage OpenACC errors as necessary.
> 	(c_parser_oacc_all_clauses): Update call to
> 	p_parser_omp_clause_reduction.
> 	(c_parser_omp_all_clauses): Likewise.
> 	* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
> 	gang reductions.
> 
> 	gcc/cp/
> 	* parser.c (cp_parser_omp_var_list_no_open):  New c_omp_region_type
> 	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
> 	OpenACC.
> 	(cp_parser_omp_clause_reduction): Update call to
> 	cp_parser_omp_variable_list.  Propage OpenACC errors as necessary.
> 	(cp_parser_oacc_all_clauses): Update call to
> 	cp_parser_omp_clause_reduction.
> 	(cp_parser_omp_all_clauses): Likewise.
> 	* semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC
> 	gang reductions.
> 
> 	gcc/fortran/
> 	* openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
> 	gang reductions.
> 	* trans-openmp.c (gfc_omp_clause_copy_ctor): Permit reductions.
> 
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 7a926285f3a..a6f453dae54 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -965,12 +965,13 @@ class token_pair
>  
>    /* Like token_pair::require_close, except that tokens will be skipped
>       until the desired token is found.  An error message is still produced
> -     if the next token is not as expected.  */
> +     if the next token is not as expected, unless QUIET is set.  */
>  
> -  void skip_until_found_close (c_parser *parser) const
> +  void skip_until_found_close (c_parser *parser, bool quiet = false) const
>    {
>      c_parser_skip_until_found (parser, traits_t::close_token_type,
> -			       traits_t::close_gmsgid, m_open_loc);
> +			       quiet ? NULL : traits_t::close_gmsgid,
> +			       m_open_loc);
>    }

I don't like these changes, why do you need them?  C++ FE doesn't have such
changes either, and it is fine to diagnose missing ) even if there was some
earlier error.  All other spots which require matching parens do it the
same.  Please leave those out.

 static tree                                                                                                                                       
-c_parser_omp_clause_reduction (c_parser *parser, tree list)                                                                                       
+c_parser_omp_clause_reduction (c_parser *parser, tree list,                                                                                       
+                              enum c_omp_region_type ort)                                                                                         

Note, the signature is now different, it is ok to replace is_omp argument
with enum c_omp_region_type if you wish.

>  {
>    location_t clause_loc = c_parser_peek_token (parser)->location;
> +  bool seen_error = false;
> +
>    matching_parens parens;
>    if (parens.require_open (parser))
>      {
> @@ -12855,7 +12876,13 @@ c_parser_omp_clause_reduction (c_parser *parser, tree list)
>  	  tree nl, c;
>  
>  	  nl = c_parser_omp_variable_list (parser, clause_loc,
> -					   OMP_CLAUSE_REDUCTION, list);
> +					   OMP_CLAUSE_REDUCTION, list, ort);
> +	  if (c_parser_peek_token (parser)->type != CPP_CLOSE_PAREN)
> +	    {
> +	      seen_error = true;
> +	      goto cleanup;
> +	    }
> +
>  	  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
>  	    {
>  	      tree d = OMP_CLAUSE_DECL (c), type;
> @@ -12891,7 +12918,8 @@ c_parser_omp_clause_reduction (c_parser *parser, tree list)
>  
>  	  list = nl;
>  	}
> -      parens.skip_until_found_close (parser);
> +    cleanup:
> +      parens.skip_until_found_close (parser, seen_error);
>      }
>    return list;
>  }

And the above hunks as well.

> @@ -13998,7 +14026,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
>  	  c_name = "private";
>  	  break;
>  	case PRAGMA_OACC_CLAUSE_REDUCTION:
> -	  clauses = c_parser_omp_clause_reduction (parser, clauses);
> +	  clauses = c_parser_omp_clause_reduction (parser, clauses, C_ORT_ACC);
>  	  c_name = "reduction";
>  	  break;
>  	case PRAGMA_OACC_CLAUSE_SEQ:
> @@ -14157,7 +14185,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
>  	  c_name = "private";
>  	  break;
>  	case PRAGMA_OMP_CLAUSE_REDUCTION:
> -	  clauses = c_parser_omp_clause_reduction (parser, clauses);
> +	  clauses = c_parser_omp_clause_reduction (parser, clauses, C_ORT_OMP);
>  	  c_name = "reduction";
>  	  break;
>  	case PRAGMA_OMP_CLAUSE_SCHEDULE:

Note, there are now also the IN_REDUCTION/TASK_REDUCTION clause cases that
need adjustment.

> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index 90ae306c99a..944db3fa8be 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -13087,6 +13087,14 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>  	  goto check_dup_generic;
>  
>  	case OMP_CLAUSE_REDUCTION:
> +	  if (ort == C_ORT_ACC && oacc_get_fn_attrib (current_function_decl)
> +	      && omp_find_clause (clauses, OMP_CLAUSE_GANG))

This is expensive if there are many clauses, we want to avoid O(n^2)
behavior.  For C we only have one loop, so just remember in some variable
whether there are reduction clause(s) that would conflict with gang, and
in another one whether gang clause has been seen, then deal with it at the
end if both the bools are true.

> +	    {
> +	      error_at (OMP_CLAUSE_LOCATION (c),
> +			"gang reduction on an orphan loop");
> +	      remove = true;
> +	      break;
> +	    }
>  	  need_implicitly_determined = true;
>  	  t = OMP_CLAUSE_DECL (c);
>  	  if (TREE_CODE (t) == TREE_LIST)

> @@ -31668,6 +31669,21 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
>  	      /* FALLTHROUGH.  */
>  	    case OMP_CLAUSE_DEPEND:
>  	    case OMP_CLAUSE_REDUCTION:
> +	      if (kind == OMP_CLAUSE_REDUCTION && ort == C_ORT_ACC)
> +		{
> +		  switch (cp_lexer_peek_token (parser->lexer)->type)
> +		    {
> +		    case CPP_OPEN_PAREN:
> +		    case CPP_OPEN_SQUARE:
> +		    case CPP_DOT:
> +		    case CPP_DEREF:
> +		      error ("invalid reduction variable");
> +		      decl = error_mark_node;
> +		      goto skip_comma;
> +		    default:;
> +		      break;
> +		    }
> +		}

Any reason for the above (ditto in C), rather than just adding
&& ort != C_ORT_ACC to the while loop condition for CPP_OPEN_SQUARE?
(, . or * after id-expression is like any other unhandled characters...

>  	      while (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_SQUARE))
>  		{
>  		  tree low_bound = NULL_TREE, length = NULL_TREE;

> @@ -33868,7 +33885,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
>  	  c_name = "private";
>  	  break;
>  	case PRAGMA_OACC_CLAUSE_REDUCTION:
> -	  clauses = cp_parser_omp_clause_reduction (parser, clauses);
> +	  clauses = cp_parser_omp_clause_reduction (parser, clauses, C_ORT_ACC);
>  	  c_name = "reduction";
>  	  break;
>  	case PRAGMA_OACC_CLAUSE_SEQ:
> @@ -34055,7 +34072,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
>  	  c_name = "private";
>  	  break;
>  	case PRAGMA_OMP_CLAUSE_REDUCTION:
> -	  clauses = cp_parser_omp_clause_reduction (parser, clauses);
> +	  clauses = cp_parser_omp_clause_reduction (parser, clauses, C_ORT_OMP);
>  	  c_name = "reduction";
>  	  break;
>  	case PRAGMA_OMP_CLAUSE_SCHEDULE:

Again, needs adjustement for IN_REDUCTION/TASK_REDUCTION.

> diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> index c779137da45..177acdd9cc4 100644
> --- a/gcc/cp/semantics.c
> +++ b/gcc/cp/semantics.c
> @@ -5875,6 +5875,14 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>  	  field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP);
>  	  goto check_dup_generic;
>  	case OMP_CLAUSE_REDUCTION:
> +	  if (ort == C_ORT_ACC && oacc_get_fn_attrib (current_function_decl)
> +	      && omp_find_clause (clauses, OMP_CLAUSE_GANG))
> +	    {
> +	      error_at (OMP_CLAUSE_LOCATION (c),
> +			"gang reduction on an orphan loop");
> +	      remove = true;
> +	      break;
> +	    }
>  	  field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP);
>  	  t = OMP_CLAUSE_DECL (c);
>  	  if (TREE_CODE (t) == TREE_LIST)

In C++ finish_omp_clauses there are 2 loops, so you can easily just remember
if OMP_CLAUSE_GANG has been seen in the first loop and diagnose this in the
second loop only.

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - test cases
  2018-06-29 18:38 ` [patch] various OpenACC reduction enhancements - test cases Cesar Philippidis
@ 2018-12-04 12:59   ` Jakub Jelinek
  2018-12-13 14:14     ` Julian Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 2018-12-04 12:59 UTC (permalink / raw)
  To: Cesar Philippidis, Thomas Schwinge; +Cc: gcc-patches, Tom de Vries

On Fri, Jun 29, 2018 at 11:23:21AM -0700, Cesar Philippidis wrote:
> Attached are the updated reductions tests cases. Again, these have been
> bootstrapped and regression tested cleanly for x86_64 with nvptx
> offloading. Is it OK for trunk?

If Thomas is ok with this, it is ok for trunk.

> 2018-06-29  Cesar Philippidis  <cesar@codesourcery.com>
> 	    Nathan Sidwell  <nathan@acm.org>
> 
> 	gcc/testsuite/
> 	* c-c++-common/goacc/orphan-reductions-1.c: New test.
> 	* c-c++-common/goacc/reduction-7.c: New test.
> 	* c-c++-common/goacc/routine-4.c: Update.
> 	* g++.dg/goacc/reductions-1.C: New test.
> 	* gcc.dg/goacc/loop-processing-1.c: Update.
> 	* gfortran.dg/goacc/orphan-reductions-1.f90: New test.
> 
> 	libgomp/
> 	* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
> 	* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
> 	* libgomp.oacc-fortran/reduction-9.f90: New test.

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes
  2018-12-04 12:29   ` Jakub Jelinek
@ 2018-12-04 15:54     ` Tom de Vries
  2018-12-13 15:56       ` Julian Brown
  0 siblings, 1 reply; 14+ messages in thread
From: Tom de Vries @ 2018-12-04 15:54 UTC (permalink / raw)
  To: Jakub Jelinek, Cesar Philippidis, Thomas Schwinge
  Cc: gcc-patches, Julian Brown

On 04-12-18 13:29, Jakub Jelinek wrote:
> On Fri, Jun 29, 2018 at 11:19:53AM -0700, Cesar Philippidis wrote:
>> The attached patch includes the nvptx and GCC ME reductions enhancements.
>>
>> Is this patch OK for trunk? It bootstrapped / regression tested cleanly
>> for x86_64 with nvptx offloading.
> This is all OpenACC specific code not really shareable with OpenMP, if
> Thomas (for middle-end) and Tom (for NVPTX backend) are ok with it, it is ok
> for trunk.
> 

Formatting needs to be fixed:
...
There should be exactly one space between function name and parenthesis.
160:+      unsigned old_shift = DIM_SIZE(VECTOR);
...

Also, the updated patch does not address my comment about probabilities
here ( https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00325.html ):
...
> +  /* Create the loop.  */
> +  post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;

Edges need probabilities, as in nvptx_lockless_update,
nvptx_lockfull_update and nvptx_goacc_reduction_init.
...

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - FE changes
  2018-12-04 12:57   ` Jakub Jelinek
@ 2018-12-13 14:12     ` Julian Brown
  2018-12-18 13:06       ` Jakub Jelinek
  0 siblings, 1 reply; 14+ messages in thread
From: Julian Brown @ 2018-12-13 14:12 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Cesar Philippidis, gcc-patches, Tom de Vries, Fortran List

[-- Attachment #1: Type: text/plain, Size: 11489 bytes --]

On Tue, 4 Dec 2018 13:57:24 +0100
Jakub Jelinek <jakub@redhat.com> wrote:

> On Fri, Jun 29, 2018 at 11:22:00AM -0700, Cesar Philippidis wrote:
> > 2018-06-29  Cesar Philippidis  <cesar@codesourcery.com>
> > 	    Nathan Sidwell  <nathan@acm.org>
> > 
> > 	gcc/c/
> > 	* c-parser.c (c_parser_omp_variable_list): New
> > c_omp_region_type argument.  Use it to specialize handling of
> > OMP_CLAUSE_REDUCTION for OpenACC.
> > 	(c_parser_omp_clause_reduction): Update call to
> > 	c_parser_omp_variable_list.  Propage OpenACC errors as
> > necessary. (c_parser_oacc_all_clauses): Update call to
> > 	p_parser_omp_clause_reduction.
> > 	(c_parser_omp_all_clauses): Likewise.
> > 	* c-typeck.c (c_finish_omp_clauses): Emit an error on
> > orphan OpenACC gang reductions.
> > 
> > 	gcc/cp/
> > 	* parser.c (cp_parser_omp_var_list_no_open):  New
> > c_omp_region_type argument.  Use it to specialize handling of
> > OMP_CLAUSE_REDUCTION for OpenACC.
> > 	(cp_parser_omp_clause_reduction): Update call to
> > 	cp_parser_omp_variable_list.  Propage OpenACC errors as
> > necessary. (cp_parser_oacc_all_clauses): Update call to
> > 	cp_parser_omp_clause_reduction.
> > 	(cp_parser_omp_all_clauses): Likewise.
> > 	* semantics.c (finish_omp_clauses): Emit an error on orphan
> > OpenACC gang reductions.
> > 
> > 	gcc/fortran/
> > 	* openmp.c (resolve_oacc_loop_blocks): Emit an error on
> > orphan OpenACC gang reductions.
> > 	* trans-openmp.c (gfc_omp_clause_copy_ctor): Permit
> > reductions.
> > 
> > diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> > index 7a926285f3a..a6f453dae54 100644
> > --- a/gcc/c/c-parser.c
> > +++ b/gcc/c/c-parser.c
> > @@ -965,12 +965,13 @@ class token_pair
> >  
> >    /* Like token_pair::require_close, except that tokens will be
> > skipped until the desired token is found.  An error message is
> > still produced
> > -     if the next token is not as expected.  */
> > +     if the next token is not as expected, unless QUIET is set.  */
> >  
> > -  void skip_until_found_close (c_parser *parser) const
> > +  void skip_until_found_close (c_parser *parser, bool quiet =
> > false) const {
> >      c_parser_skip_until_found (parser, traits_t::close_token_type,
> > -			       traits_t::close_gmsgid, m_open_loc);
> > +			       quiet ? NULL :
> > traits_t::close_gmsgid,
> > +			       m_open_loc);
> >    }  
> 
> I don't like these changes, why do you need them?  C++ FE doesn't
> have such changes either, and it is fine to diagnose missing ) even
> if there was some earlier error.  All other spots which require
> matching parens do it the same.  Please leave those out.

I've removed these bits.

>  static
> tree -c_parser_omp_clause_reduction (c_parser *parser, tree
> list) +c_parser_omp_clause_reduction (c_parser *parser, tree
> list,                                                                                       
> +                              enum c_omp_region_type
> ort)                                                                                         
> 
> Note, the signature is now different, it is ok to replace is_omp
> argument with enum c_omp_region_type if you wish.

I've done as you suggest.

> >  {
> >    location_t clause_loc = c_parser_peek_token (parser)->location;
> > +  bool seen_error = false;
> > +
> >    matching_parens parens;
> >    if (parens.require_open (parser))
> >      {
> > @@ -12855,7 +12876,13 @@ c_parser_omp_clause_reduction (c_parser
> > *parser, tree list) tree nl, c;
> >  
> >  	  nl = c_parser_omp_variable_list (parser, clause_loc,
> > -					   OMP_CLAUSE_REDUCTION,
> > list);
> > +					   OMP_CLAUSE_REDUCTION,
> > list, ort);
> > +	  if (c_parser_peek_token (parser)->type !=
> > CPP_CLOSE_PAREN)
> > +	    {
> > +	      seen_error = true;
> > +	      goto cleanup;
> > +	    }
> > +
> >  	  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
> >  	    {
> >  	      tree d = OMP_CLAUSE_DECL (c), type;
> > @@ -12891,7 +12918,8 @@ c_parser_omp_clause_reduction (c_parser
> > *parser, tree list) 
> >  	  list = nl;
> >  	}
> > -      parens.skip_until_found_close (parser);
> > +    cleanup:
> > +      parens.skip_until_found_close (parser, seen_error);
> >      }
> >    return list;
> >  }  
> 
> And the above hunks as well.

Removed.

> > @@ -13998,7 +14026,7 @@ c_parser_oacc_all_clauses (c_parser
> > *parser, omp_clause_mask mask, c_name = "private";
> >  	  break;
> >  	case PRAGMA_OACC_CLAUSE_REDUCTION:
> > -	  clauses = c_parser_omp_clause_reduction (parser,
> > clauses);
> > +	  clauses = c_parser_omp_clause_reduction (parser,
> > clauses, C_ORT_ACC); c_name = "reduction";
> >  	  break;
> >  	case PRAGMA_OACC_CLAUSE_SEQ:
> > @@ -14157,7 +14185,7 @@ c_parser_omp_all_clauses (c_parser *parser,
> > omp_clause_mask mask, c_name = "private";
> >  	  break;
> >  	case PRAGMA_OMP_CLAUSE_REDUCTION:
> > -	  clauses = c_parser_omp_clause_reduction (parser,
> > clauses);
> > +	  clauses = c_parser_omp_clause_reduction (parser,
> > clauses, C_ORT_OMP); c_name = "reduction";
> >  	  break;
> >  	case PRAGMA_OMP_CLAUSE_SCHEDULE:  
> 
> Note, there are now also the IN_REDUCTION/TASK_REDUCTION clause cases
> that need adjustment.

Done.

> > diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> > index 90ae306c99a..944db3fa8be 100644
> > --- a/gcc/c/c-typeck.c
> > +++ b/gcc/c/c-typeck.c
> > @@ -13087,6 +13087,14 @@ c_finish_omp_clauses (tree clauses, enum
> > c_omp_region_type ort) goto check_dup_generic;
> >  
> >  	case OMP_CLAUSE_REDUCTION:
> > +	  if (ort == C_ORT_ACC && oacc_get_fn_attrib
> > (current_function_decl)
> > +	      && omp_find_clause (clauses, OMP_CLAUSE_GANG))  
> 
> This is expensive if there are many clauses, we want to avoid O(n^2)
> behavior.  For C we only have one loop, so just remember in some
> variable whether there are reduction clause(s) that would conflict
> with gang, and in another one whether gang clause has been seen, then
> deal with it at the end if both the bools are true.

There are (now) two loops in the C frontend too -- so I've handled this
the same as C++, below, because there may be multiple reduction clauses
with a "gang" so removing them afterwards (like nogroup_seen) seems
awkward.

> > +	    {
> > +	      error_at (OMP_CLAUSE_LOCATION (c),
> > +			"gang reduction on an orphan loop");
> > +	      remove = true;
> > +	      break;
> > +	    }
> >  	  need_implicitly_determined = true;
> >  	  t = OMP_CLAUSE_DECL (c);
> >  	  if (TREE_CODE (t) == TREE_LIST)  
> 
> > @@ -31668,6 +31669,21 @@ cp_parser_omp_var_list_no_open (cp_parser
> > *parser, enum omp_clause_code kind, /* FALLTHROUGH.  */
> >  	    case OMP_CLAUSE_DEPEND:
> >  	    case OMP_CLAUSE_REDUCTION:
> > +	      if (kind == OMP_CLAUSE_REDUCTION && ort == C_ORT_ACC)
> > +		{
> > +		  switch (cp_lexer_peek_token
> > (parser->lexer)->type)
> > +		    {
> > +		    case CPP_OPEN_PAREN:
> > +		    case CPP_OPEN_SQUARE:
> > +		    case CPP_DOT:
> > +		    case CPP_DEREF:
> > +		      error ("invalid reduction variable");
> > +		      decl = error_mark_node;
> > +		      goto skip_comma;
> > +		    default:;
> > +		      break;
> > +		    }
> > +		}  
> 
> Any reason for the above (ditto in C), rather than just adding
> && ort != C_ORT_ACC to the while loop condition for CPP_OPEN_SQUARE?
> (, . or * after id-expression is like any other unhandled
> characters...

I think the reason was that 'decl' ('t' in the C version) is not set to
error_mark_node if the while loop is skipped, and then the gimplifier
gets confused. I've tried to tackle this in another way, by checking
there aren't any stray characters before the next comma or
close-parenthesis.

I'm not sure if you were objecting to the error message too -- with the
current patch, the user will just get e.g.:

error: expected ')' before '.' token

if they try to use an unsupported type of construct as a reduction
target.

> >  	      while (cp_lexer_next_token_is (parser->lexer,
> > CPP_OPEN_SQUARE)) {
> >  		  tree low_bound = NULL_TREE, length = NULL_TREE;  
> 
> > @@ -33868,7 +33885,7 @@ cp_parser_oacc_all_clauses (cp_parser
> > *parser, omp_clause_mask mask, c_name = "private";
> >  	  break;
> >  	case PRAGMA_OACC_CLAUSE_REDUCTION:
> > -	  clauses = cp_parser_omp_clause_reduction (parser,
> > clauses);
> > +	  clauses = cp_parser_omp_clause_reduction (parser,
> > clauses, C_ORT_ACC); c_name = "reduction";
> >  	  break;
> >  	case PRAGMA_OACC_CLAUSE_SEQ:
> > @@ -34055,7 +34072,7 @@ cp_parser_omp_all_clauses (cp_parser
> > *parser, omp_clause_mask mask, c_name = "private";
> >  	  break;
> >  	case PRAGMA_OMP_CLAUSE_REDUCTION:
> > -	  clauses = cp_parser_omp_clause_reduction (parser,
> > clauses);
> > +	  clauses = cp_parser_omp_clause_reduction (parser,
> > clauses, C_ORT_OMP); c_name = "reduction";
> >  	  break;
> >  	case PRAGMA_OMP_CLAUSE_SCHEDULE:  
> 
> Again, needs adjustement for IN_REDUCTION/TASK_REDUCTION.

Done.

> > diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> > index c779137da45..177acdd9cc4 100644
> > --- a/gcc/cp/semantics.c
> > +++ b/gcc/cp/semantics.c
> > @@ -5875,6 +5875,14 @@ finish_omp_clauses (tree clauses, enum
> > c_omp_region_type ort) field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD)
> > == C_ORT_OMP); goto check_dup_generic;
> >  	case OMP_CLAUSE_REDUCTION:
> > +	  if (ort == C_ORT_ACC && oacc_get_fn_attrib
> > (current_function_decl)
> > +	      && omp_find_clause (clauses, OMP_CLAUSE_GANG))
> > +	    {
> > +	      error_at (OMP_CLAUSE_LOCATION (c),
> > +			"gang reduction on an orphan loop");
> > +	      remove = true;
> > +	      break;
> > +	    }
> >  	  field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP);
> >  	  t = OMP_CLAUSE_DECL (c);
> >  	  if (TREE_CODE (t) == TREE_LIST)  
> 
> In C++ finish_omp_clauses there are 2 loops, so you can easily just
> remember if OMP_CLAUSE_GANG has been seen in the first loop and
> diagnose this in the second loop only.

Done.

Re-tested with offloading to nvptx, and with updates to the new
testcases (to be posted). OK?

Thanks,

Julian

ChangeLog

2018-xx-xx  Cesar Philippidis  <cesar@codesourcery.com>
            Nathan Sidwell  <nathan@acm.org>
            Julian Brown  <julian@codesourcery.com>

        gcc/c/
        * c-parser.c (c_parser_omp_variable_list): New c_omp_region_type
        argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
        OpenACC.
        (c_parser_omp_clause_reduction): Change is_omp boolean parameter to
        c_omp_region_type.  Update call to c_parser_omp_variable_list.
        (c_parser_oacc_all_clauses): Update calls to
        c_parser_omp_clause_reduction.
        (c_parser_omp_all_clauses): Likewise.
        * c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
        gang reductions.

        gcc/cp/
        * parser.c (cp_parser_omp_var_list_no_open):  New c_omp_region_type
        argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
        OpenACC.
        (cp_parser_omp_clause_reduction): Change is_omp boolean parameter to
        c_omp_region_type.  Update call to cp_parser_omp_var_list_no_open.
        (cp_parser_oacc_all_clauses): Update call to
        cp_parser_omp_clause_reduction.
        (cp_parser_omp_all_clauses): Likewise.
        * semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC
        gang reductions.

        gcc/fortran/
        * openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
        gang reductions.
        * trans-openmp.c (gfc_omp_clause_copy_ctor): Permit reductions.

[-- Attachment #2: trunk-reductions-fe-2.diff --]
[-- Type: text/x-patch, Size: 13880 bytes --]

commit 0fcaa69b46d2661c3b133c42e0ce73693088b04e
Author: Julian Brown <julian@codesourcery.com>
Date:   Wed Dec 12 11:09:29 2018 -0800

    Various OpenACC reduction enhancements - FE changes
    
    2018-xx-xx  Cesar Philippidis  <cesar@codesourcery.com>
    	    Nathan Sidwell  <nathan@acm.org>
    	    Julian Brown  <julian@codesourcery.com>
    
    	gcc/c/
    	* c-parser.c (c_parser_omp_variable_list): New c_omp_region_type
    	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
    	OpenACC.
    	(c_parser_omp_clause_reduction): Change is_omp boolean parameter to
    	c_omp_region_type.  Update call to c_parser_omp_variable_list.
    	(c_parser_oacc_all_clauses): Update calls to
    	c_parser_omp_clause_reduction.
    	(c_parser_omp_all_clauses): Likewise.
    	* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
    	gang reductions.
    
    	gcc/cp/
    	* parser.c (cp_parser_omp_var_list_no_open):  New c_omp_region_type
    	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
    	OpenACC.
    	(cp_parser_omp_clause_reduction): Change is_omp boolean parameter to
    	c_omp_region_type.  Update call to cp_parser_omp_var_list_no_open.
    	(cp_parser_oacc_all_clauses): Update call to
    	cp_parser_omp_clause_reduction.
    	(cp_parser_omp_all_clauses): Likewise.
    	* semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC
    	gang reductions.
    
    	gcc/fortran/
    	* openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
    	gang reductions.
    	* trans-openmp.c (gfc_omp_clause_copy_ctor): Permit reductions.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index b875c4f..59a461b 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11869,7 +11869,8 @@ c_parser_oacc_wait_list (c_parser *parser, location_t clause_loc, tree list)
 static tree
 c_parser_omp_variable_list (c_parser *parser,
 			    location_t clause_loc,
-			    enum omp_clause_code kind, tree list)
+			    enum omp_clause_code kind, tree list,
+			    enum c_omp_region_type ort = C_ORT_OMP)
 {
   auto_vec<c_token> tokens;
   unsigned int tokens_avail = 0;
@@ -12004,7 +12005,8 @@ c_parser_omp_variable_list (c_parser *parser,
 	    case OMP_CLAUSE_REDUCTION:
 	    case OMP_CLAUSE_IN_REDUCTION:
 	    case OMP_CLAUSE_TASK_REDUCTION:
-	      while (c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
+	      while (ort != C_ORT_ACC
+		     && c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
 		{
 		  tree low_bound = NULL_TREE, length = NULL_TREE;
 
@@ -12074,6 +12076,10 @@ c_parser_omp_variable_list (c_parser *parser,
 			}
 		    }
 		}
+	      if (ort == C_ORT_ACC
+	          && c_parser_next_token_is_not (parser, CPP_COMMA)
+		  && c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN))
+		t = error_mark_node;
 	      break;
 	    default:
 	      break;
@@ -13446,7 +13452,7 @@ c_parser_omp_clause_private (c_parser *parser, tree list)
 
 static tree
 c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
-			       bool is_omp, tree list)
+			       enum c_omp_region_type ort, tree list)
 {
   location_t clause_loc = c_parser_peek_token (parser)->location;
   matching_parens parens;
@@ -13457,7 +13463,7 @@ c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
       enum tree_code code = ERROR_MARK;
       tree reduc_id = NULL_TREE;
 
-      if (kind == OMP_CLAUSE_REDUCTION && is_omp)
+      if (kind == OMP_CLAUSE_REDUCTION && ort == C_ORT_OMP)
 	{
 	  if (c_parser_next_token_is_keyword (parser, RID_DEFAULT)
 	      && c_parser_peek_2nd_token (parser)->type == CPP_COMMA)
@@ -13542,7 +13548,8 @@ c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
 	{
 	  tree nl, c;
 
-	  nl = c_parser_omp_variable_list (parser, clause_loc, kind, list);
+	  nl = c_parser_omp_variable_list (parser, clause_loc, kind, list, ort);
+
 	  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
 	    {
 	      tree d = OMP_CLAUSE_DECL (c), type;
@@ -14847,7 +14854,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	case PRAGMA_OACC_CLAUSE_REDUCTION:
 	  clauses
 	    = c_parser_omp_clause_reduction (parser, OMP_CLAUSE_REDUCTION,
-					     false, clauses);
+					     C_ORT_ACC, clauses);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
@@ -14976,7 +14983,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	case PRAGMA_OMP_CLAUSE_IN_REDUCTION:
 	  clauses
 	    = c_parser_omp_clause_reduction (parser, OMP_CLAUSE_IN_REDUCTION,
-					     true, clauses);
+					     C_ORT_OMP, clauses);
 	  c_name = "in_reduction";
 	  break;
 	case PRAGMA_OMP_CLAUSE_LASTPRIVATE:
@@ -15014,7 +15021,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	case PRAGMA_OMP_CLAUSE_REDUCTION:
 	  clauses
 	    = c_parser_omp_clause_reduction (parser, OMP_CLAUSE_REDUCTION,
-					     true, clauses);
+					     C_ORT_OMP, clauses);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OMP_CLAUSE_SCHEDULE:
@@ -15028,7 +15035,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	case PRAGMA_OMP_CLAUSE_TASK_REDUCTION:
 	  clauses
 	    = c_parser_omp_clause_reduction (parser, OMP_CLAUSE_TASK_REDUCTION,
-					     true, clauses);
+					     C_ORT_OMP, clauses);
 	  c_name = "task_reduction";
 	  break;
 	case PRAGMA_OMP_CLAUSE_UNTIED:
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 1a89727..a251447 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13683,6 +13683,7 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
   bool last_iterators_remove = false;
   tree *nogroup_seen = NULL;
   bool reduction_seen = false;
+  bool oacc_gang_seen = false;
 
   bitmap_obstack_initialize (NULL);
   bitmap_initialize (&generic_head, &bitmap_default_obstack);
@@ -13697,10 +13698,15 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 
   if (ort & C_ORT_ACC)
     for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_ASYNC)
-	{
+      switch (OMP_CLAUSE_CODE (c))
+        {
+	case OMP_CLAUSE_ASYNC:
 	  oacc_async = true;
 	  break;
+	case OMP_CLAUSE_GANG:
+	  oacc_gang_seen = true;
+	  break;
+	default:;
 	}
 
   for (pc = &clauses, c = clauses; c ; c = *pc)
@@ -13721,6 +13727,13 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  goto check_dup_generic;
 
 	case OMP_CLAUSE_REDUCTION:
+	  if (oacc_gang_seen && oacc_get_fn_attrib (current_function_decl))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"gang reduction on an orphan loop");
+	      remove = true;
+	      break;
+	    }
 	  reduction_seen = true;
 	  /* FALLTHRU */
 	case OMP_CLAUSE_IN_REDUCTION:
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 8b669a8..71e84ea 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32111,7 +32111,8 @@ check_no_duplicate_clause (tree clauses, enum omp_clause_code code,
 
 static tree
 cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
-				tree list, bool *colon)
+				tree list, bool *colon,
+				enum c_omp_region_type ort = C_ORT_OMP)
 {
   cp_token *token;
   bool saved_colon_corrects_to_scope_p = parser->colon_corrects_to_scope_p;
@@ -32201,7 +32202,8 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
 	    case OMP_CLAUSE_REDUCTION:
 	    case OMP_CLAUSE_IN_REDUCTION:
 	    case OMP_CLAUSE_TASK_REDUCTION:
-	      while (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_SQUARE))
+	      while (ort != C_ORT_ACC
+		     && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_SQUARE))
 		{
 		  tree low_bound = NULL_TREE, length = NULL_TREE;
 
@@ -32262,10 +32264,18 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
 		cp_parser_parse_definitely (parser);
 	    }
 
-	  tree u = build_omp_clause (token->location, kind);
-	  OMP_CLAUSE_DECL (u) = decl;
-	  OMP_CLAUSE_CHAIN (u) = list;
-	  list = u;
+	  if (ort == C_ORT_ACC
+	      && cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA)
+	      && cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN))
+	    decl = error_mark_node;
+
+	  if (decl != error_mark_node)
+	    {
+	      tree u = build_omp_clause (token->location, kind);
+	      OMP_CLAUSE_DECL (u) = decl;
+	      OMP_CLAUSE_CHAIN (u) = list;
+	      list = u;
+	    }
 	}
       else
 	list = tree_cons (decl, NULL_TREE, list);
@@ -33442,7 +33452,7 @@ cp_parser_omp_clause_ordered (cp_parser *parser,
 
 static tree
 cp_parser_omp_clause_reduction (cp_parser *parser, enum omp_clause_code kind,
-				bool is_omp, tree list)
+				enum c_omp_region_type ort, tree list)
 {
   enum tree_code code = ERROR_MARK;
   tree nlist, c, id = NULL_TREE;
@@ -33452,7 +33462,7 @@ cp_parser_omp_clause_reduction (cp_parser *parser, enum omp_clause_code kind,
   if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
     return list;
 
-  if (kind == OMP_CLAUSE_REDUCTION && is_omp)
+  if (kind == OMP_CLAUSE_REDUCTION && ort == C_ORT_OMP)
     {
       if (cp_lexer_next_token_is_keyword (parser->lexer, RID_DEFAULT)
 	  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_COMMA))
@@ -33553,8 +33563,7 @@ cp_parser_omp_clause_reduction (cp_parser *parser, enum omp_clause_code kind,
   if (!cp_parser_require (parser, CPP_COLON, RT_COLON))
     goto resync_fail;
 
-  nlist = cp_parser_omp_var_list_no_open (parser, kind, list,
-					  NULL);
+  nlist = cp_parser_omp_var_list_no_open (parser, kind, list, NULL, ort);
   for (c = nlist; c != list; c = OMP_CLAUSE_CHAIN (c))
     {
       OMP_CLAUSE_REDUCTION_CODE (c) = code;
@@ -34798,7 +34807,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	case PRAGMA_OACC_CLAUSE_REDUCTION:
 	  clauses
 	    = cp_parser_omp_clause_reduction (parser, OMP_CLAUSE_REDUCTION,
-					      false, clauses);
+					      C_ORT_ACC, clauses);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
@@ -34948,7 +34957,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	case PRAGMA_OMP_CLAUSE_IN_REDUCTION:
 	  clauses
 	    = cp_parser_omp_clause_reduction (parser, OMP_CLAUSE_IN_REDUCTION,
-					      true, clauses);
+					      C_ORT_OMP, clauses);
 	  c_name = "in_reduction";
 	  break;
 	case PRAGMA_OMP_CLAUSE_LASTPRIVATE:
@@ -34992,7 +35001,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	case PRAGMA_OMP_CLAUSE_REDUCTION:
 	  clauses
 	    = cp_parser_omp_clause_reduction (parser, OMP_CLAUSE_REDUCTION,
-					      true, clauses);
+					      C_ORT_OMP, clauses);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OMP_CLAUSE_SCHEDULE:
@@ -35009,7 +35018,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  clauses
 	    = cp_parser_omp_clause_reduction (parser,
 					      OMP_CLAUSE_TASK_REDUCTION,
-					      true, clauses);
+					      C_ORT_OMP, clauses);
 	  c_name = "task_reduction";
 	  break;
 	case PRAGMA_OMP_CLAUSE_UNTIED:
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index c1240cc..fc63e3c 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6055,6 +6055,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
   tree last_iterators = NULL_TREE;
   bool last_iterators_remove = false;
   bool reduction_seen = false;
+  bool oacc_gang_seen = false;
 
   bitmap_obstack_initialize (NULL);
   bitmap_initialize (&generic_head, &bitmap_default_obstack);
@@ -6069,10 +6070,15 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 
   if (ort & C_ORT_ACC)
     for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_ASYNC)
-	{
+      switch (OMP_CLAUSE_CODE (c))
+        {
+	case OMP_CLAUSE_ASYNC:
 	  oacc_async = true;
 	  break;
+	case OMP_CLAUSE_GANG:
+	  oacc_gang_seen = true;
+	  break;
+	default:;
 	}
 
   for (pc = &clauses, c = clauses; c ; c = *pc)
@@ -6089,6 +6095,13 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	  field_ok = ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP);
 	  goto check_dup_generic;
 	case OMP_CLAUSE_REDUCTION:
+	  if (oacc_gang_seen && oacc_get_fn_attrib (current_function_decl))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"gang reduction on an orphan loop");
+	      remove = true;
+	      break;
+	    }
 	  reduction_seen = true;
 	  /* FALLTHRU */
 	case OMP_CLAUSE_IN_REDUCTION:
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 261a54a..ffa04e6 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5843,6 +5843,18 @@ resolve_oacc_loop_blocks (gfc_code *code)
   if (!oacc_is_loop (code))
     return;
 
+  if (code->op == EXEC_OACC_LOOP
+      && code->ext.omp_clauses->lists[OMP_LIST_REDUCTION]
+      && code->ext.omp_clauses->gang)
+    {
+      fortran_omp_context *c;
+      for (c = omp_current_ctx; c; c = c->previous)
+	if (!oacc_is_loop (c->code))
+	  break;
+      if (c == NULL || !oacc_is_parallel (c->code))
+	gfc_error ("gang reduction on an orphan loop at %L", &code->loc);
+    }
+
   if (code->ext.omp_clauses->tile_list && code->ext.omp_clauses->gang
       && code->ext.omp_clauses->worker && code->ext.omp_clauses->vector)
     gfc_error ("Tiled loop cannot be parallelized across gangs, workers and "
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index bf3f469..fbe0d3c 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -564,7 +564,8 @@ gfc_omp_clause_copy_ctor (tree clause, tree dest, tree src)
   stmtblock_t block, cond_block;
 
   gcc_assert (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_FIRSTPRIVATE
-	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_LINEAR);
+	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_LINEAR
+	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_REDUCTION);
 
   if ((! GFC_DESCRIPTOR_TYPE_P (type)
        || GFC_TYPE_ARRAY_AKIND (type) != GFC_ARRAY_ALLOCATABLE)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - test cases
  2018-12-04 12:59   ` Jakub Jelinek
@ 2018-12-13 14:14     ` Julian Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Julian Brown @ 2018-12-13 14:14 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Cesar Philippidis, Thomas Schwinge, gcc-patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1243 bytes --]

On Tue, 4 Dec 2018 13:59:33 +0100
Jakub Jelinek <jakub@redhat.com> wrote:

> On Fri, Jun 29, 2018 at 11:23:21AM -0700, Cesar Philippidis wrote:
> > Attached are the updated reductions tests cases. Again, these have
> > been bootstrapped and regression tested cleanly for x86_64 with
> > nvptx offloading. Is it OK for trunk?  
> 
> If Thomas is ok with this, it is ok for trunk.

Here's a new version to go with the FE patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00930.html

Thanks,

Julian

ChangeLog

2018-xx-xx  Cesar Philippidis  <cesar@codesourcery.com>
            Nathan Sidwell  <nathan@acm.org>
            Julian Brown  <julian@codesourcery.com>

        gcc/testsuite/
        * c-c++-common/goacc/orphan-reductions-1.c: New test.
        * c-c++-common/goacc/reduction-7.c: New test.
        * c-c++-common/goacc/routine-4.c: Update.
        * g++.dg/goacc/reductions-1.C: New test.
        * gcc.dg/goacc/loop-processing-1.c: Update.
        * gfortran.dg/goacc/orphan-reductions-1.f90: New test.

        libgomp/
        * libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
        * libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
        * libgomp.oacc-fortran/reduction-9.f90: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: trunk-reductions-tests-2.diff --]
[-- Type: text/x-patch, Size: 35382 bytes --]

commit 7d445a56d6db96696cec8359e58258d47fa7c9ae
Author: Julian Brown <julian@codesourcery.com>
Date:   Wed Dec 12 11:11:03 2018 -0800

    Various OpenACC reduction enhancements - test cases
    
    2018-xx-xx  Cesar Philippidis  <cesar@codesourcery.com>
    	    Nathan Sidwell  <nathan@acm.org>
    	    Julian Brown  <julian@codesourcery.com>
    
    	gcc/testsuite/
    	* c-c++-common/goacc/orphan-reductions-1.c: New test.
    	* c-c++-common/goacc/reduction-7.c: New test.
    	* c-c++-common/goacc/routine-4.c: Update.
    	* g++.dg/goacc/reductions-1.C: New test.
    	* gcc.dg/goacc/loop-processing-1.c: Update.
    	* gfortran.dg/goacc/orphan-reductions-1.f90: New test.
    
    	libgomp/
    	* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
    	* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
    	* libgomp.oacc-fortran/reduction-9.f90: New test.

diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
new file mode 100644
index 0000000..b0bd4a7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
@@ -0,0 +1,56 @@
+/* Test orphan reductions.  */
+
+#include <assert.h>
+
+#pragma acc routine seq
+int
+seq_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop seq reduction(+:sum)
+  for (i = 0; i < n; i++)
+    sum = sum + 1;
+
+  return sum;
+}
+
+#pragma acc routine gang
+int
+gang_reduction (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc loop gang reduction(+:s1) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+    s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+    s2 = s2 + 2;
+
+
+  return s1 + s2;
+}
+
+#pragma acc routine worker
+int
+worker_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop worker reduction(+:sum)
+  for (i = 0; i < n; i++)
+    sum = sum + 3;
+
+  return sum;
+}
+
+#pragma acc routine vector
+int
+vector_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop vector reduction(+:sum)
+  for (i = 0; i < n; i++)
+    sum = sum + 4;
+
+  return sum;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-7.c b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
new file mode 100644
index 0000000..eba1d02
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
@@ -0,0 +1,111 @@
+/* Exercise invalid reductions on array and struct members.  */
+
+void
+test_parallel ()
+{
+  struct {
+    int a;
+    float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel reduction(+:s1.a) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+    s1.a += 1;
+
+#pragma acc parallel reduction(+:s1.b[3]) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+    s1.b[3] += 1;
+
+#pragma acc parallel reduction(+:s2[2].a) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    s2[2].a += 1;
+
+#pragma acc parallel reduction(+:s2[3].b[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    s2[3].b[4] += 1;
+
+#pragma acc parallel reduction(+:z[5]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    z[5] += 1;
+}
+
+void
+test_combined ()
+{
+  struct {
+    int a;
+    float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel loop reduction(+:s1.a) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+    s1.a += 1;
+
+#pragma acc parallel loop reduction(+:s1.b[3]) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+    s1.b[3] += 1;
+
+#pragma acc parallel loop reduction(+:s2[2].a) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    s2[2].a += 1;
+
+#pragma acc parallel loop reduction(+:s2[3].b[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    s2[3].b[4] += 1;
+
+#pragma acc parallel loop reduction(+:z[5]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    z[5] += 1;
+
+}
+
+void
+test_loops ()
+{
+  struct {
+    int a;
+    float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel
+  {
+#pragma acc loop reduction(+:s1.a) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+    s1.a += 1;
+
+#pragma acc loop reduction(+:s1.b[3]) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+    s1.b[3] += 1;
+
+#pragma acc loop reduction(+:s2[2].a) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    s2[2].a += 1;
+
+#pragma acc loop reduction(+:s2[3].b[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    s2[3].b[4] += 1;
+
+#pragma acc loop reduction(+:z[5]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+    z[5] += 1;
+  }
+}
+
+int
+main ()
+{
+  test_parallel ();
+  test_combined ();
+  test_loops ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-4.c b/gcc/testsuite/c-c++-common/goacc/routine-4.c
index efc4a0b..91abfb5 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-4.c
@@ -22,7 +22,7 @@ void seq (void)
   for (int i = 0; i < 10; i++)
     red ++;
 
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+#pragma acc loop seq reduction (+:red)
   for (int i = 0; i < 10; i++)
     red ++;
 
@@ -48,7 +48,7 @@ void vector (void) /* { dg-message "declared here" "1" } */
   for (int i = 0; i < 10; i++)
     red ++;
 
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+#pragma acc loop seq reduction (+:red)
   for (int i = 0; i < 10; i++)
     red ++;
 
@@ -74,7 +74,7 @@ void worker (void) /* { dg-message "declared here" "2" } */
   for (int i = 0; i < 10; i++)
     red ++;
 
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+#pragma acc loop seq reduction (+:red)
   for (int i = 0; i < 10; i++)
     red ++;
 
@@ -100,7 +100,7 @@ void gang (void) /* { dg-message "declared here" "3" } */
   for (int i = 0; i < 10; i++)
     red ++;
 
-#pragma acc loop gang reduction (+:red)
+#pragma acc loop seq reduction (+:red)
   for (int i = 0; i < 10; i++)
     red ++;
 
diff --git a/gcc/testsuite/g++.dg/goacc/reductions-1.C b/gcc/testsuite/g++.dg/goacc/reductions-1.C
new file mode 100644
index 0000000..18f43f4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/goacc/reductions-1.C
@@ -0,0 +1,548 @@
+// Test for invalid reduction variables.
+
+class C1
+{
+  int b, d[10];
+
+public:
+  int a, c[10];
+
+  C1 () { a = 0; b = 0; }
+  int& get_b () { return b; }
+  int* get_d () { return d; }
+};
+
+template <typename T>
+class C2
+{
+  T b, d[10];
+
+public:
+  T a, c[10];
+
+  C2 () { a = 0; b = 0; }
+  T& get_b () { return b; }
+  T* get_d () { return d; }
+};
+
+struct S1
+{
+  int a, b, c[10], d[10];
+
+  S1 () { a = 0; b = 0; }
+  int& get_b () { return b; }
+  int* get_d () { return d; }
+};
+
+template <typename T>
+struct S2
+{
+  T a, b, c[10], d[10];
+
+  S2 () { a = 0; b = 0; }
+  T& get_b () { return b; }
+  T* get_d () { return d; }
+};
+
+template <typename T>
+void
+test_parallel ()
+{
+  int i, a[10];
+  T b[10];
+  C1 c1, c1a[10];
+  C2<T> c2, c2a[10];
+  S1 s1, s1a[10];
+  S2<float> s2, s2a[10];
+
+  // Reductions on class members.
+
+#pragma acc parallel reduction(+:c1.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c1.a += 1;
+
+#pragma acc parallel reduction(+:c1.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c1.get_b () += 1;
+
+#pragma acc parallel reduction(+:c1.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c1.c[1] += 1;
+
+#pragma acc parallel reduction(+:c1.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c1.get_d ()[1] += 1;
+
+#pragma acc parallel reduction(+:c1a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c1a[1].a += 1;
+
+#pragma acc parallel reduction(+:c1a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c1a[1].get_b () += 1;
+
+#pragma acc parallel reduction(+:c1a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c1a[1].c[1] += 1;
+
+#pragma acc parallel reduction(+:c1a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c1a[1].get_d ()[1] += 1;
+
+
+  // Reductions on a template class member.
+
+#pragma acc parallel reduction(+:c2.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c2.a += 1;
+
+#pragma acc parallel reduction(+:c2.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c2.get_b () += 1;
+
+#pragma acc parallel reduction(+:c2.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c2.c[1] += 1;
+
+#pragma acc parallel reduction(+:c2.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c2.get_d ()[1] += 1;
+
+
+#pragma acc parallel reduction(+:c2a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c2a[1].a += 1;
+
+#pragma acc parallel reduction(+:c2a[1].get_b ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c2a[1].get_b () += 1;
+
+#pragma acc parallel reduction(+:c2a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c2a[1].c[1] += 1;
+
+#pragma acc parallel reduction(+:c2a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c2a[1].get_d ()[1] += 1;
+
+
+  // Reductions on struct element.
+
+#pragma acc parallel reduction(+:s1.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s1.a += 1;
+
+#pragma acc parallel reduction(+:s1.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s1.get_b () += 1;
+
+#pragma acc parallel reduction(+:s1.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s1.c[1] += 1;
+
+#pragma acc parallel reduction(+:s1.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s1.get_d ()[1] += 1;
+
+#pragma acc parallel reduction(+:s1a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s1a[1].a += 1;
+
+#pragma acc parallel reduction(+:s1a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s1a[1].get_b () += 1;
+
+#pragma acc parallel reduction(+:s1a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s1a[1].c[1] += 1;
+
+#pragma acc parallel reduction(+:s1a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s1a[1].get_d ()[1] += 1;
+
+
+  // Reductions on a template struct element.
+
+#pragma acc parallel reduction(+:s2.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s2.a += 1;
+
+#pragma acc parallel reduction(+:s2.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s2.get_b () += 1;
+
+#pragma acc parallel reduction(+:s2.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s2.c[1] += 1;
+
+#pragma acc parallel reduction(+:s2.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s2.get_d ()[1] += 1;
+
+#pragma acc parallel reduction(+:s2a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s2a[1].a += 1;
+
+#pragma acc parallel reduction(+:s2a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s2a[1].get_b () += 1;
+
+#pragma acc parallel reduction(+:s2a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s2a[1].c[1] += 1;
+
+#pragma acc parallel reduction(+:s2a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s2a[1].get_d ()[1] += 1;
+
+
+  // Reductions on arrays.
+
+#pragma acc parallel reduction(+:a[10]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    a[10] += 1;
+
+#pragma acc parallel reduction(+:b[10]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    b[10] += 1;
+}
+
+template <typename T>
+void
+test_combined ()
+{
+  int i, a[10];
+  T b[10];
+  C1 c1, c1a[10];
+  C2<T> c2, c2a[10];
+  S1 s1, s1a[10];
+  S2<float> s2, s2a[10];
+
+  // Reductions on class members.
+
+#pragma acc parallel loop reduction(+:c1.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c1.a += 1;
+
+#pragma acc parallel loop reduction(+:c1.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c1.get_b () += 1;
+
+#pragma acc parallel loop reduction(+:c1.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c1.c[1] += 1;
+
+#pragma acc parallel loop reduction(+:c1.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c1.get_d ()[1] += 1;
+
+#pragma acc parallel loop reduction(+:c1a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c1a[1].a += 1;
+
+#pragma acc parallel loop reduction(+:c1a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c1a[1].get_b () += 1;
+
+#pragma acc parallel loop reduction(+:c1a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c1a[1].c[1] += 1;
+
+#pragma acc parallel loop reduction(+:c1a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c1a[1].get_d ()[1] += 1;
+
+
+  // Reductions on a template class member.
+
+#pragma acc parallel loop reduction(+:c2.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c2.a += 1;
+
+#pragma acc parallel loop reduction(+:c2.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c2.get_b () += 1;
+
+#pragma acc parallel loop reduction(+:c2.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c2.c[1] += 1;
+
+#pragma acc parallel loop reduction(+:c2.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    c2.get_d ()[1] += 1;
+
+
+#pragma acc parallel loop reduction(+:c2a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c2a[1].a += 1;
+
+#pragma acc parallel loop reduction(+:c2a[1].get_b ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c2a[1].get_b () += 1;
+
+#pragma acc parallel loop reduction(+:c2a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c2a[1].c[1] += 1;
+
+#pragma acc parallel loop reduction(+:c2a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    c2a[1].get_d ()[1] += 1;
+
+
+  // Reductions on struct element.
+
+#pragma acc parallel loop reduction(+:s1.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s1.a += 1;
+
+#pragma acc parallel loop reduction(+:s1.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s1.get_b () += 1;
+
+#pragma acc parallel loop reduction(+:s1.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s1.c[1] += 1;
+
+#pragma acc parallel loop reduction(+:s1.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s1.get_d ()[1] += 1;
+
+#pragma acc parallel loop reduction(+:s1a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s1a[1].a += 1;
+
+#pragma acc parallel loop reduction(+:s1a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s1a[1].get_b () += 1;
+
+#pragma acc parallel loop reduction(+:s1a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s1a[1].c[1] += 1;
+
+#pragma acc parallel loop reduction(+:s1a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s1a[1].get_d ()[1] += 1;
+
+
+  // Reductions on a template struct element.
+
+#pragma acc parallel loop reduction(+:s2.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s2.a += 1;
+
+#pragma acc parallel loop reduction(+:s2.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s2.get_b () += 1;
+
+#pragma acc parallel loop reduction(+:s2.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s2.c[1] += 1;
+
+#pragma acc parallel loop reduction(+:s2.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+  for (i = 0; i < 100; i++)
+    s2.get_d ()[1] += 1;
+
+#pragma acc parallel loop reduction(+:s2a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s2a[1].a += 1;
+
+#pragma acc parallel loop reduction(+:s2a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s2a[1].get_b () += 1;
+
+#pragma acc parallel loop reduction(+:s2a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s2a[1].c[1] += 1;
+
+#pragma acc parallel loop reduction(+:s2a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    s2a[1].get_d ()[1] += 1;
+
+
+  // Reductions on arrays.
+
+#pragma acc parallel loop reduction(+:a[10]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    a[10] += 1;
+
+#pragma acc parallel loop reduction(+:b[10]) // { dg-error "expected '\\\)' before '\\\[' token" }
+  for (i = 0; i < 100; i++)
+    b[10] += 1;
+}
+
+template <typename T>
+void
+test_loop ()
+{
+  int i, a[10];
+  T b[10];
+  C1 c1, c1a[10];
+  C2<T> c2, c2a[10];
+  S1 s1, s1a[10];
+  S2<float> s2, s2a[10];
+
+  // Reductions on class members.
+
+  #pragma acc parallel
+  {
+
+#pragma acc loop reduction(+:c1.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      c1.a += 1;
+
+#pragma acc loop reduction(+:c1.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      c1.get_b () += 1;
+
+#pragma acc loop reduction(+:c1.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      c1.c[1] += 1;
+
+#pragma acc loop reduction(+:c1.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      c1.get_d ()[1] += 1;
+
+#pragma acc loop reduction(+:c1a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      c1a[1].a += 1;
+
+#pragma acc loop reduction(+:c1a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      c1a[1].get_b () += 1;
+
+#pragma acc loop reduction(+:c1a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      c1a[1].c[1] += 1;
+
+#pragma acc loop reduction(+:c1a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      c1a[1].get_d ()[1] += 1;
+
+
+    // Reductions on a template class member.
+
+#pragma acc loop reduction(+:c2.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      c2.a += 1;
+
+#pragma acc loop reduction(+:c2.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      c2.get_b () += 1;
+
+#pragma acc loop reduction(+:c2.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      c2.c[1] += 1;
+
+#pragma acc loop reduction(+:c2.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      c2.get_d ()[1] += 1;
+
+
+#pragma acc loop reduction(+:c2a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      c2a[1].a += 1;
+
+#pragma acc loop reduction(+:c2a[1].get_b ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      c2a[1].get_b () += 1;
+
+#pragma acc loop reduction(+:c2a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      c2a[1].c[1] += 1;
+
+#pragma acc loop reduction(+:c2a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      c2a[1].get_d ()[1] += 1;
+
+
+    // Reductions on struct element.
+
+#pragma acc loop reduction(+:s1.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      s1.a += 1;
+
+#pragma acc loop reduction(+:s1.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      s1.get_b () += 1;
+
+#pragma acc loop reduction(+:s1.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      s1.c[1] += 1;
+
+#pragma acc loop reduction(+:s1.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      s1.get_d ()[1] += 1;
+
+#pragma acc loop reduction(+:s1a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      s1a[1].a += 1;
+
+#pragma acc loop reduction(+:s1a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      s1a[1].get_b () += 1;
+
+#pragma acc loop reduction(+:s1a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      s1a[1].c[1] += 1;
+
+#pragma acc loop reduction(+:s1a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      s1a[1].get_d ()[1] += 1;
+
+
+    // Reductions on a template struct element.
+
+#pragma acc loop reduction(+:s2.a) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      s2.a += 1;
+
+#pragma acc loop reduction(+:s2.get_b ()) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      s2.get_b () += 1;
+
+#pragma acc loop reduction(+:s2.c[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      s2.c[1] += 1;
+
+#pragma acc loop reduction(+:s2.get_d ()[1]) // { dg-error "expected '\\\)' before '\\\.' token" }
+    for (i = 0; i < 100; i++)
+      s2.get_d ()[1] += 1;
+
+#pragma acc loop reduction(+:s2a[1].a) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      s2a[1].a += 1;
+
+#pragma acc loop reduction(+:s2a[1].get_b ()) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      s2a[1].get_b () += 1;
+
+#pragma acc loop reduction(+:s2a[1].c[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      s2a[1].c[1] += 1;
+
+#pragma acc loop reduction(+:s2a[1].get_d ()[1]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      s2a[1].get_d ()[1] += 1;
+
+
+    // Reductions on arrays.
+
+#pragma acc loop reduction(+:a[10]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      a[10] += 1;
+
+#pragma acc loop reduction(+:b[10]) // { dg-error "expected '\\\)' before '\\\[' token" }
+    for (i = 0; i < 100; i++)
+      b[10] += 1;
+  }
+}
+
+int
+main ()
+{
+  test_parallel<double> ();
+  test_combined<long> ();
+  test_loop<short> ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
index bd4c07e..1d222ab 100644
--- a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -15,4 +15,5 @@ void vector_1 (int *ary, int size)
   }
 }
 
-/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 24\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccdevlow" } } */
+/* { dg-final { scan-tree-dump {
+OpenACC loops.*Loop 0\(0\).*Loop [0-9]{2}\(1\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop 6\(6\).*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.*Tail-1:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 2\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 1\);} "oaccdevlow" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90 b/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90
new file mode 100644
index 0000000..7f363d5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90
@@ -0,0 +1,204 @@
+! Verify that gang reduction on orphan OpenACC loops reported as errors.
+
+subroutine s1
+  implicit none
+
+  integer, parameter :: n = 100
+  integer :: i, sum
+  sum = 0
+
+  !$acc parallel reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel loop gang reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+
+  !$acc parallel
+  !$acc loop gang reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+end subroutine s1
+
+subroutine s2
+  implicit none
+  !$acc routine worker
+
+  integer, parameter :: n = 100
+  integer :: i, j, sum
+  sum = 0
+
+  !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+  do i = 1, n
+     sum = sum + 1
+  end do
+
+  !$acc loop reduction(+:sum)
+  do i = 1, n
+     !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+     do j = 1, n
+        sum = sum + 1
+     end do
+  end do
+end subroutine s2
+
+integer function f1 ()
+  implicit none
+
+  integer, parameter :: n = 100
+  integer :: i, sum
+  sum = 0
+
+  !$acc parallel reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel loop gang reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+
+  !$acc parallel
+  !$acc loop gang reduction(+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+
+  f1 = sum
+end function f1
+
+integer function f2 ()
+  implicit none
+  !$acc routine worker
+
+  integer, parameter :: n = 100
+  integer :: i, j, sum
+  sum = 0
+
+  !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+  do i = 1, n
+     sum = sum + 1
+  end do
+
+  !$acc loop reduction(+:sum)
+  do i = 1, n
+     !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+     do j = 1, n
+        sum = sum + 1
+     end do
+  end do
+
+  f2 = sum
+end function f2
+
+module m
+contains
+  subroutine s3
+    implicit none
+
+    integer, parameter :: n = 100
+    integer :: i, sum
+    sum = 0
+
+    !$acc parallel reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+    !$acc end parallel
+
+    !$acc parallel loop gang reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+
+    !$acc parallel
+    !$acc loop gang reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+    !$acc end parallel
+  end subroutine s3
+
+  subroutine s4
+    implicit none
+    !$acc routine worker
+
+    integer, parameter :: n = 100
+    integer :: i, j, sum
+    sum = 0
+
+    !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+    do i = 1, n
+       sum = sum + 1
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, n
+       !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+       do j = 1, n
+          sum = sum + 1
+       end do
+    end do
+  end subroutine s4
+
+  integer function f3 ()
+    implicit none
+
+    integer, parameter :: n = 100
+    integer :: i, sum
+    sum = 0
+
+    !$acc parallel reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+    !$acc end parallel
+
+    !$acc parallel loop gang reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+
+    !$acc parallel
+    !$acc loop gang reduction(+:sum)
+    do i = 1, n
+       sum = sum + 1
+    end do
+    !$acc end parallel
+
+    f3 = sum
+  end function f3
+
+  integer function f4 ()
+    implicit none
+    !$acc routine worker
+
+    integer, parameter :: n = 100
+    integer :: i, j, sum
+    sum = 0
+
+    !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+    do i = 1, n
+       sum = sum + 1
+    end do
+
+    !$acc loop reduction(+:sum)
+    do i = 1, n
+       !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orphan loop" }
+       do j = 1, n
+          sum = sum + 1
+       end do
+    end do
+
+    f4 = sum
+  end function f4
+end module m
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c
new file mode 100644
index 0000000..856ef0e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c
@@ -0,0 +1,29 @@
+/* Check a parallel reduction which is are explicitly initialized by
+   the user.  */
+
+#include <assert.h>
+
+int
+main ()
+{
+  int n = 10;
+  float accel = 1.0, host = 1.0;
+  int i;
+
+#pragma acc parallel copyin(n) reduction(*:accel)
+  {
+    accel = 1.0;
+#pragma acc loop gang reduction(*:accel)
+    for( i = 1; i <= n; i++)
+      {
+	accel *= 2.0;
+      }
+  }
+
+  for (i = 1; i <= n; i++)
+    host *= 2.0;
+
+  assert (accel == host);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c
new file mode 100644
index 0000000..350174a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c
@@ -0,0 +1,32 @@
+#include <complex.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+typedef float _Complex Type;
+
+#define N 32
+
+int
+main (void)
+{
+  Type ary[N];
+
+  for (int ix = 0; ix < N;  ix++)
+    ary[ix] = 1.0 + 1.0j;
+
+  Type tprod = 1.0;
+
+#pragma acc parallel vector_length(32)
+  {
+#pragma acc loop vector reduction (*:tprod)
+    for (int ix = 0; ix < N; ix++)
+      tprod *= ary[ix];
+  }
+
+  Type expected = 65536.0;
+
+  if (tprod != expected)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90
new file mode 100644
index 0000000..fd64d88
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90
@@ -0,0 +1,54 @@
+! Test gang reductions on dummy variables.
+
+! { dg-do run }
+
+program main
+  implicit none
+
+  integer g, w, v, c
+
+  g = 0
+  w = 0
+  v = 0
+  c = 0
+
+  call reduction (g, w, v, c)
+
+  if (g /= 10) call abort
+  if (w /= 10) call abort
+  if (v /= 10) call abort
+  if (c /= 100) call abort
+end program main
+
+subroutine reduction (g, w, v, c)
+  implicit none
+
+  integer g, w, v, c, i
+
+  !$acc parallel
+  !$acc loop reduction(+:g) gang
+  do i = 1, 10
+     g = g + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction(+:w) worker
+  do i = 1, 10
+     w = w + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel
+  !$acc loop reduction(+:v) vector
+  do i = 1, 10
+     v = v + 1
+  end do
+  !$acc end parallel
+
+  !$acc parallel loop reduction(+:c) gang worker vector
+  do i = 1, 100
+     c = c + 1
+  end do
+  !$acc end parallel loop
+end subroutine reduction

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes
  2018-12-04 15:54     ` Tom de Vries
@ 2018-12-13 15:56       ` Julian Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Julian Brown @ 2018-12-13 15:56 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Jakub Jelinek, Cesar Philippidis, Thomas Schwinge, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

On Tue, 4 Dec 2018 16:55:04 +0100
Tom de Vries <tdevries@suse.de> wrote:

> On 04-12-18 13:29, Jakub Jelinek wrote:
> > On Fri, Jun 29, 2018 at 11:19:53AM -0700, Cesar Philippidis wrote:  
> >> The attached patch includes the nvptx and GCC ME reductions
> >> enhancements.
> >>
> >> Is this patch OK for trunk? It bootstrapped / regression tested
> >> cleanly for x86_64 with nvptx offloading.  
> > This is all OpenACC specific code not really shareable with OpenMP,
> > if Thomas (for middle-end) and Tom (for NVPTX backend) are ok with
> > it, it is ok for trunk.
> >   
> 
> Formatting needs to be fixed:
> ...
> There should be exactly one space between function name and
> parenthesis. 160:+      unsigned old_shift = DIM_SIZE(VECTOR);
> ...
> 
> Also, the updated patch does not address my comment about
> probabilities here
> ( https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00325.html ): ...
> > +  /* Create the loop.  */
> > +  post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;  
> 
> Edges need probabilities, as in nvptx_lockless_update,
> nvptx_lockfull_update and nvptx_goacc_reduction_init.
> ...

Something like the attached?

Tested alongside other revised patches in the series:

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00930.html
https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00931.html

(except the lines adding edge probabilities, which I've
smoke-tested but haven't yet gone through a full test cycle).

Thanks,

Julian

ChangeLog

        gcc/
        * config/nvptx/nvptx.c (nvptx_propagate_unified): New.
        (nvptx_split_blocks): Call it for cond_uni insn.
        (nvptx_expand_cond_uni): New.
        (enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
        (nvptx_init_builtins): Initialize it.
        (nvptx_expand_builtin):
        (nvptx_generate_vector_shuffle): Change integral SHIFT operand to
        tree BITS operand.
        (nvptx_vector_reduction): New.
        (nvptx_adjust_reduction_type): New.
        (nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
        (nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
        (nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
        Use it to adjust the type of ref_to_res.
        (nvptx_goacc_reduction_teardown):
        * config/nvptx/nvptx.md (cond_uni): New pattern.


[-- Attachment #2: trunk-nvptx-reductions-2.diff --]
[-- Type: text/x-patch, Size: 12332 bytes --]

commit 401876d422c4fa7f02c1b899e81568eea6ad7531
Author: Julian Brown <julian@codesourcery.com>
Date:   Tue Dec 11 13:35:52 2018 -0800

    Various OpenACC reduction enhancements - ME and nvptx changes
    
    	gcc/
    	* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
    	(nvptx_split_blocks): Call it for cond_uni insn.
    	(nvptx_expand_cond_uni): New.
    	(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
    	(nvptx_init_builtins): Initialize it.
    	(nvptx_expand_builtin):
    	(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
    	tree BITS operand.
    	(nvptx_vector_reduction): New.
    	(nvptx_adjust_reduction_type): New.
    	(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
    	(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
    	(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
    	Use it to adjust the type of ref_to_res.
    	(nvptx_goacc_reduction_teardown):
    	* config/nvptx/nvptx.md (cond_uni): New pattern.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9903a27..0023dad 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2863,6 +2863,52 @@ nvptx_reorg_uniform_simt ()
     }
 }
 
+/* UNIFIED is a cond_uni insn.  Find the branch insn it affects, and
+   mark that as unified.  We expect to be in a single block.  */
+
+static void
+nvptx_propagate_unified (rtx_insn *unified)
+{
+  rtx_insn *probe = unified;
+  rtx cond_reg = SET_DEST (PATTERN (unified));
+  rtx pat = NULL_RTX;
+
+  /* Find the comparison.  (We could skip this and simply scan to he
+     blocks' terminating branch, if we didn't care for self
+     checking.)  */
+  for (;;)
+    {
+      probe = next_real_insn (probe);
+      if (!probe)
+	break;
+      pat = PATTERN (probe);
+
+      if (GET_CODE (pat) == SET
+	  && GET_RTX_CLASS (GET_CODE (SET_SRC (pat))) == RTX_COMPARE
+	  && XEXP (SET_SRC (pat), 0) == cond_reg)
+	break;
+      gcc_assert (NONJUMP_INSN_P (probe));
+    }
+  gcc_assert (pat);
+  rtx pred_reg = SET_DEST (pat);
+
+  /* Find the branch.  */
+  do
+    probe = NEXT_INSN (probe);
+  while (!JUMP_P (probe));
+
+  pat = PATTERN (probe);
+  rtx itec = XEXP (SET_SRC (pat), 0);
+  gcc_assert (XEXP (itec, 0) == pred_reg);
+
+  /* Mark the branch's condition as unified.  */
+  rtx unspec = gen_rtx_UNSPEC (BImode, gen_rtvec (1, pred_reg),
+			       UNSPEC_BR_UNIFIED);
+  bool ok = validate_change (probe, &XEXP (itec, 0), unspec, false);
+
+  gcc_assert (ok);
+}
+
 /* Loop structure of the function.  The entire function is described as
    a NULL loop.  */
 
@@ -2964,6 +3010,9 @@ nvptx_split_blocks (bb_insn_map_t *map)
 	    continue;
 	  switch (recog_memoized (insn))
 	    {
+	    case CODE_FOR_cond_uni:
+	      nvptx_propagate_unified (insn);
+	      /* FALLTHROUGH */
 	    default:
 	      seen_insn = true;
 	      continue;
@@ -5083,6 +5132,21 @@ nvptx_expand_cmp_swap (tree exp, rtx target,
   return target;
 }
 
+/* Expander for the compare unified builtin.  */
+
+static rtx
+nvptx_expand_cond_uni (tree exp, rtx target, machine_mode mode, int ignore)
+{
+  if (ignore)
+    return target;
+
+  rtx src = expand_expr (CALL_EXPR_ARG (exp, 0),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+
+  emit_insn (gen_cond_uni (target, src));
+
+  return target;
+}
 
 /* Codes for all the NVPTX builtins.  */
 enum nvptx_builtins
@@ -5092,6 +5156,7 @@ enum nvptx_builtins
   NVPTX_BUILTIN_WORKER_ADDR,
   NVPTX_BUILTIN_CMP_SWAP,
   NVPTX_BUILTIN_CMP_SWAPLL,
+  NVPTX_BUILTIN_COND_UNI,
   NVPTX_BUILTIN_MAX
 };
 
@@ -5129,6 +5194,7 @@ nvptx_init_builtins (void)
        (PTRVOID, ST, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAP, "cmp_swap", (UINT, PTRVOID, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAPLL, "cmp_swapll", (LLUINT, PTRVOID, LLUINT, LLUINT, NULL_TREE));
+  DEF (COND_UNI, "cond_uni", (integer_type_node, integer_type_node, NULL_TREE));
 
 #undef DEF
 #undef ST
@@ -5161,6 +5227,9 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
     case NVPTX_BUILTIN_CMP_SWAPLL:
       return nvptx_expand_cmp_swap (exp, target, mode, ignore);
 
+    case NVPTX_BUILTIN_COND_UNI:
+      return nvptx_expand_cond_uni (exp, target, mode, ignore);
+
     default: gcc_unreachable ();
     }
 }
@@ -5284,7 +5353,7 @@ nvptx_get_worker_red_addr (tree type, tree offset)
 
 static void
 nvptx_generate_vector_shuffle (location_t loc,
-			       tree dest_var, tree var, unsigned shift,
+			       tree dest_var, tree var, tree bits,
 			       gimple_seq *seq)
 {
   unsigned fn = NVPTX_BUILTIN_SHUFFLE;
@@ -5307,7 +5376,6 @@ nvptx_generate_vector_shuffle (location_t loc,
     }
   
   tree call = nvptx_builtin_decl (fn, true);
-  tree bits = build_int_cst (unsigned_type_node, shift);
   tree kind = build_int_cst (unsigned_type_node, SHUFFLE_DOWN);
   tree expr;
 
@@ -5583,6 +5651,128 @@ nvptx_reduction_update (location_t loc, gimple_stmt_iterator *gsi,
     return nvptx_lockfull_update (loc, gsi, ptr, var, op);
 }
 
+/* Emit a vector-level reduction loop.  OLD_VAR is the incoming
+   variable to reduce (valid in each vector), OP is the reduction
+   operator.  Return the reduced value (an SSA var).
+
+   The code we generate looks like:
+      unsigned old_shift = DIM_SIZE (VECTOR);
+      do
+	{
+	  shift = PHI (old_shift, new_shift);
+	  var = PHI (old_var, new_var);
+	  new_shift = shift >> 1;
+	  other_var = VSHUFFLE (var, new_shift);
+	  new_var = var OP other_var;
+	  cond_var = builtin_cond_uni (new_shift);
+	}
+      while (cond_var > 1);
+
+  The builtin_cond_ini expands to a cond_uni instruction, which is
+  processed in nvptx_split_blocks to mark the loop's terminating
+  branch instruction.  */
+
+static tree
+nvptx_vector_reduction (location_t loc, gimple_stmt_iterator *gsi,
+			tree old_var, tree_code op)
+{
+  tree var_type = TREE_TYPE (old_var);
+
+  /*  Emit old_shift = DIM_SIZE (VECTOR) */
+  tree old_shift = make_ssa_name (integer_type_node);
+  tree dim = build_int_cst (integer_type_node, GOMP_DIM_VECTOR);
+  gcall *call = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, dim);
+  gimple_set_lhs (call, old_shift);
+  gimple_set_location (call, loc);
+  gsi_insert_before (gsi, call, GSI_SAME_STMT);
+
+  /* Split the block just after the init stmts.  */
+  basic_block pre_bb = gsi_bb (*gsi);
+  edge pre_edge = split_block (pre_bb, call);
+  basic_block loop_bb = pre_edge->dest;
+  pre_bb = pre_edge->src;
+  /* Reset the iterator.  */
+  *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+  tree shift = make_ssa_name (integer_type_node);
+  tree new_shift = make_ssa_name (integer_type_node);
+  tree var = make_ssa_name (var_type);
+  tree other_var = make_ssa_name (var_type);
+  tree new_var = make_ssa_name (var_type);
+
+  /* Build and insert the loop body.  */
+  gimple_seq loop_seq = NULL;
+
+  /* new_shift = shift >> 1 */
+  tree shift_expr = fold_build2 (RSHIFT_EXPR, integer_type_node,
+				 shift, integer_one_node);
+  gimplify_assign (new_shift, shift_expr, &loop_seq);
+
+  /* other_var = shuffle (var, shift) */
+  nvptx_generate_vector_shuffle (loc, other_var, var, new_shift, &loop_seq);
+  /* new_var = var OP other_var */
+  tree red_expr = fold_build2 (op, var_type, var, other_var);
+  gimplify_assign (new_var, red_expr, &loop_seq);
+
+  /* Mark the iterator variable as unified.  */
+  tree cond_var = make_ssa_name (integer_type_node);
+  tree uni_fn = nvptx_builtin_decl (NVPTX_BUILTIN_COND_UNI, true);
+  tree uni_expr = build_call_expr_loc (loc, uni_fn, 1, new_shift);
+  gimplify_assign (cond_var,  uni_expr, &loop_seq);
+
+  gcond *cond = gimple_build_cond (LE_EXPR, cond_var, integer_one_node,
+				   NULL_TREE, NULL_TREE);
+  gimple_seq_add_stmt (&loop_seq, cond);
+
+  gsi_insert_seq_before (gsi, loop_seq, GSI_SAME_STMT);
+
+  /* Split the block just after the loop stmts.  */
+  edge post_edge = split_block (loop_bb, cond);
+  post_edge->probability = profile_probability::even ();
+  basic_block post_bb = post_edge->dest;
+  loop_bb = post_edge->src;
+  *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+  /* Create the loop.  */
+  post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;
+  edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_FALSE_VALUE);
+  loop_edge->probability = profile_probability::even ();
+  set_immediate_dominator (CDI_DOMINATORS, loop_bb, pre_bb);
+  set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
+
+  gphi *shift_phi = create_phi_node (shift, loop_bb);
+  add_phi_arg (shift_phi, old_shift, pre_edge, loc);
+  add_phi_arg (shift_phi, new_shift, loop_edge, loc);
+
+  gphi *var_phi = create_phi_node (var, loop_bb);
+  add_phi_arg (var_phi, old_var, pre_edge, loc);
+  add_phi_arg (var_phi, new_var, loop_edge, loc);
+
+  loop *loop = alloc_loop ();
+  loop->header = loop_bb;
+  loop->latch = loop_bb;
+  add_loop (loop, loop_bb->loop_father);
+
+  return new_var;
+}
+
+/* Dummy reduction vars that have GOMP_MAP_FIRSTPRIVATE_POINTER data
+   mappings gets retyped to (void *).  Adjust the type of VAR to TYPE
+   as appropriate.  */
+
+static tree
+nvptx_adjust_reduction_type (tree var, tree type, gimple_seq *seq)
+{
+  if (TREE_TYPE (TREE_TYPE (var)) == type)
+    return var;
+
+  tree ptype = build_pointer_type (type);
+  tree t = make_ssa_name (ptype);
+  tree expr = fold_build1 (NOP_EXPR, ptype, var);
+  gimple_seq_add_stmt (seq, gimple_build_assign (t, expr));
+  return t;
+}
+
 /* NVPTX implementation of GOACC_REDUCTION_SETUP.  */
 
 static void
@@ -5602,7 +5792,11 @@ nvptx_goacc_reduction_setup (gcall *call)
       tree ref_to_res = gimple_call_arg (call, 1);
 
       if (!integer_zerop (ref_to_res))
-	var = build_simple_mem_ref (ref_to_res);
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  var = build_simple_mem_ref (ref_to_res);
+	}
     }
   
   if (level == GOMP_DIM_WORKER)
@@ -5702,7 +5896,11 @@ nvptx_goacc_reduction_init (gcall *call)
 	    init = var;
 	}
 
-      gimplify_assign (lhs, init, &seq);
+      /* The LHS may be NULL if a reduction variable on a parallel
+	 construct is initialized to some constant inside the parallel
+	 region.  */
+      if (lhs)
+	gimplify_assign (lhs, init, &seq);
     }
 
   pop_gimplify_context (NULL);
@@ -5727,22 +5925,7 @@ nvptx_goacc_reduction_fini (gcall *call)
   push_gimplify_context (true);
 
   if (level == GOMP_DIM_VECTOR)
-    {
-      /* Emit binary shuffle tree.  TODO. Emit this as an actual loop,
-	 but that requires a method of emitting a unified jump at the
-	 gimple level.  */
-      for (int shfl = PTX_VECTOR_LENGTH / 2; shfl > 0; shfl = shfl >> 1)
-	{
-	  tree other_var = make_ssa_name (TREE_TYPE (var));
-	  nvptx_generate_vector_shuffle (gimple_location (call),
-					 other_var, var, shfl, &seq);
-
-	  r = make_ssa_name (TREE_TYPE (var));
-	  gimplify_assign (r, fold_build2 (op, TREE_TYPE (var),
-					   var, other_var), &seq);
-	  var = r;
-	}
-    }
+    r = nvptx_vector_reduction (gimple_location (call), &gsi, var, op);
   else
     {
       tree accum = NULL_TREE;
@@ -5760,7 +5943,11 @@ nvptx_goacc_reduction_fini (gcall *call)
       else if (integer_zerop (ref_to_res))
 	r = var;
       else
-	accum = ref_to_res;
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  accum = ref_to_res;
+	}
 
       if (accum)
 	{
@@ -5809,7 +5996,11 @@ nvptx_goacc_reduction_teardown (gcall *call)
       tree ref_to_res = gimple_call_arg (call, 1);
 
       if (!integer_zerop (ref_to_res))
-	gimplify_assign (build_simple_mem_ref (ref_to_res), var, &seq);
+	{
+	  ref_to_res = nvptx_adjust_reduction_type (ref_to_res, TREE_TYPE (var),
+						    &seq);
+	  gimplify_assign (build_simple_mem_ref (ref_to_res), var, &seq);
+	}
     }
 
   if (lhs)
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index ca00b1d..4f7d8ea 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -600,6 +600,13 @@
   "%J0\\tbra.uni\\t%l1;"
   [(set_attr "predicable" "false")])
 
+(define_insn "cond_uni"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+          (unspec:SI [(match_operand:SI 1 "nvptx_nonmemory_operand" "R")]
+		     UNSPEC_BR_UNIFIED))]
+  ""
+  "%.\\tmov%t0\\t%0, %1; // unified")
+
 (define_expand "cbranch<mode>4"
   [(set (pc)
 	(if_then_else (match_operator 0 "nvptx_comparison_operator"

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] various OpenACC reduction enhancements - FE changes
  2018-12-13 14:12     ` Julian Brown
@ 2018-12-18 13:06       ` Jakub Jelinek
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Jelinek @ 2018-12-18 13:06 UTC (permalink / raw)
  To: Julian Brown; +Cc: Cesar Philippidis, gcc-patches, Tom de Vries, Fortran List

On Thu, Dec 13, 2018 at 02:11:31PM +0000, Julian Brown wrote:
> > Any reason for the above (ditto in C), rather than just adding
> > && ort != C_ORT_ACC to the while loop condition for CPP_OPEN_SQUARE?
> > (, . or * after id-expression is like any other unhandled
> > characters...
> 
> I think the reason was that 'decl' ('t' in the C version) is not set to
> error_mark_node if the while loop is skipped, and then the gimplifier
> gets confused. I've tried to tackle this in another way, by checking
> there aren't any stray characters before the next comma or
> close-parenthesis.
> 
> I'm not sure if you were objecting to the error message too -- with the
> current patch, the user will just get e.g.:
> 
> error: expected ')' before '.' token
> 
> if they try to use an unsupported type of construct as a reduction
> target.

> @@ -12004,7 +12005,8 @@ c_parser_omp_variable_list (c_parser *parser,
>  	    case OMP_CLAUSE_REDUCTION:
>  	    case OMP_CLAUSE_IN_REDUCTION:
>  	    case OMP_CLAUSE_TASK_REDUCTION:
> -	      while (c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
> +	      while (ort != C_ORT_ACC
> +		     && c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
>  		{
>  		  tree low_bound = NULL_TREE, length = NULL_TREE;
>  
> @@ -12074,6 +12076,10 @@ c_parser_omp_variable_list (c_parser *parser,
>  			}
>  		    }
>  		}
> +	      if (ort == C_ORT_ACC
> +	          && c_parser_next_token_is_not (parser, CPP_COMMA)
> +		  && c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN))
> +		t = error_mark_node;
>  	      break;
>  	    default:
>  	      break;

I still don't understand this at all, sorry.
So, t is guaranteed to be non-error_mark_node before entering this spot.
If you have reduction (decl[0]) etc. vs. reduction (decl), why do you care whether
it is added to the returned list or not for error recovery?  If it is something
that causes ICE in the gimplifier, then user could have written just
reduction (decl) or reduction (decl, ) and have it added to the list anyway,
so the bug would be that it isn't diagnosed as something incorrect in
c_finish_omp_clauses (or whatever the problem with it is).
If there is any kind of garbage after the decl, it will just return to the
caller at that point and the caller should do the error recovery, the same
for reduction (decl[0]) as well as for reduction (decl, [0]).

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-12-18 13:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-29 18:20 [patch] various OpenACC reduction enhancements Cesar Philippidis
2018-06-29 18:22 ` [patch] various OpenACC reduction enhancements - ME and nvptx changes Cesar Philippidis
2018-10-05 14:09   ` Tom de Vries
2018-10-30 20:09     ` Cesar Philippidis
2018-12-04 12:29   ` Jakub Jelinek
2018-12-04 15:54     ` Tom de Vries
2018-12-13 15:56       ` Julian Brown
2018-06-29 18:23 ` [patch] various OpenACC reduction enhancements - FE changes Cesar Philippidis
2018-12-04 12:57   ` Jakub Jelinek
2018-12-13 14:12     ` Julian Brown
2018-12-18 13:06       ` Jakub Jelinek
2018-06-29 18:38 ` [patch] various OpenACC reduction enhancements - test cases Cesar Philippidis
2018-12-04 12:59   ` Jakub Jelinek
2018-12-13 14:14     ` Julian Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).