public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* openacc reference reductions
@ 2016-02-09 15:14 Cesar Philippidis
  2016-02-09 15:33 ` Nathan Sidwell
  2021-04-26 10:35 ` [OpenACC] Don't compile libgomp testcases with '-w' (was: openacc reference reductions) Thomas Schwinge
  0 siblings, 2 replies; 19+ messages in thread
From: Cesar Philippidis @ 2016-02-09 15:14 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

This patch teaches omp-lower how handle reference-typed reductions,
which are common in fortran subroutines. Unlike the implementation in
gomp4 branch, this patch doesn't rewrite the reference reduction
variables as local variables. Instead, a local copy is created for
reduction variable.

There are two things that stick out in this patch. First, I took care
not remap any reduction variable appearing on a parallel directive
inside an offloaded region in order to keep it private. Second, you'll
notice that I'm creating quite a few temporary pointers inside
lower_oacc_reductions. Without those separate pointers, I'd get SSA
validation errors because those pointers get deferenced multiple times.
I didn't investigate that problem further.

Is this patch ok for trunk?

Cesar

[-- Attachment #2: trunk-reductions-20160208.diff --]
[-- Type: text/x-patch, Size: 133115 bytes --]

2016-02-09  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d41688b..8a66760 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -308,6 +308,28 @@ is_oacc_kernels (omp_context *ctx)
 	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
 }
 
+/* Return true if CTX corresponds to an oacc parallel region and if
+   VAR is used in a reduction.  */
+
+static bool
+is_oacc_parallel_reduction (tree var, omp_context *ctx)
+{
+  if (!is_oacc_parallel (ctx))
+    return false;
+
+  tree clauses = gimple_omp_target_clauses (ctx->stmt);
+
+  /* Don't install a local copy of the decl if it used
+     inside a acc parallel reduction.  */
+  if (is_oacc_parallel (ctx))
+    for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+	  && OMP_CLAUSE_DECL (c) == var)
+	return true;
+
+  return false;
+}
+
 /* If DECL is the artificial dummy VAR_DECL created for non-static
    data member privatization, return the underlying "this" parameter,
    otherwise return NULL.  */
@@ -2121,7 +2143,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  else
 		    install_var_field (decl, true, 3, ctx,
 				       base_pointers_restrict);
-		  if (is_gimple_omp_offloaded (ctx->stmt))
+		  if (is_gimple_omp_offloaded (ctx->stmt)
+		      && !is_oacc_parallel_reduction (decl, ctx))
 		    install_var_local (decl, ctx);
 		}
 	    }
@@ -4821,7 +4844,7 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  gimplify_assign (ptr, x, ilist);
 		}
 	    }
-	  else if (is_reference (var) && !is_oacc_parallel (ctx))
+	  else if (is_reference (var))
 	    {
 	      /* For references that are being privatized for Fortran,
 		 allocate new backing storage for the new pointer
@@ -5557,7 +5580,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
-	tree incoming, outgoing;
+	tree incoming, outgoing, v1, v2, v3;
+	bool is_private = false;
 
 	enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
 	if (rcode == MINUS_EXPR)
@@ -5570,7 +5594,6 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
 	if (!var)
 	  var = orig;
-	gcc_assert (!is_reference (var));
 
 	incoming = outgoing = var;
 	
@@ -5606,22 +5629,38 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		for (; cls;  cls = OMP_CLAUSE_CHAIN (cls))
 		  if (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_REDUCTION
 		      && orig == OMP_CLAUSE_DECL (cls))
-		    goto has_outer_reduction;
+		    {
+		      incoming = outgoing = lookup_decl (orig, probe);
+		      goto has_outer_reduction;
+		    }
+		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
+			    || OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			   && orig == OMP_CLAUSE_DECL (cls))
+		    {
+		      is_private = true;
+		      goto do_lookup;
+		    }
 	      }
 
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
-		&& maybe_lookup_field (orig, outer))
+		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
 		ref_to_res = build_receiver_ref (orig, false, outer);
 		if (is_reference (orig))
 		  ref_to_res = build_simple_mem_ref (ref_to_res);
 
+		tree type = TREE_TYPE (var);
+		if (POINTER_TYPE_P (type))
+		  type = TREE_TYPE (type);
+
 		outgoing = var;
-		incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+		incoming = omp_reduction_init_op (loc, rcode, type);
 	      }
+	    else if (ctx->outer)
+	      incoming = outgoing = lookup_decl (orig, ctx->outer);
 	    else
 	      incoming = outgoing = orig;
 	      
@@ -5631,6 +5670,37 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	if (!ref_to_res)
 	  ref_to_res = integer_zero_node;
 
+        if (is_reference (orig))
+	  {
+	    tree type = TREE_TYPE (var);
+	    const char *id = IDENTIFIER_POINTER (DECL_NAME (var));
+
+	    if (!inner)
+	      {
+		tree x = create_tmp_var (TREE_TYPE (type), id);
+		gimplify_assign (var, build_fold_addr_expr (x), fork_seq);
+	      }
+
+	    v1 = create_tmp_var (type, id);
+	    v2 = create_tmp_var (type, id);
+	    v3 = create_tmp_var (type, id);
+
+	    gimplify_assign (v1, var, fork_seq);
+	    gimplify_assign (v2, var, fork_seq);
+	    gimplify_assign (v3, var, fork_seq);
+
+	    var = build_simple_mem_ref (var);
+	    v1 = build_simple_mem_ref (v1);
+	    v2 = build_simple_mem_ref (v2);
+	    v3 = build_simple_mem_ref (v3);
+	    outgoing = build_simple_mem_ref (outgoing);
+
+	    if (TREE_CODE (incoming) != INTEGER_CST)
+	      incoming = build_simple_mem_ref (incoming);
+	  }
+	else
+	  v1 = v2 = v3 = var;
+
 	/* Determine position in reduction buffer, which may be used
 	   by target.  */
 	enum machine_mode mode = TYPE_MODE (TREE_TYPE (var));
@@ -5660,20 +5730,20 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, init_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v1, level, op, off);
 	tree fini_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, fini_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v2, level, op, off);
 	tree teardown_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, teardown_code,
-					  ref_to_res, var, level, op, off);
+					  ref_to_res, v3, level, op, off);
 
-	gimplify_assign (var, setup_call, &before_fork);
-	gimplify_assign (var, init_call, &after_fork);
-	gimplify_assign (var, fini_call, &before_join);
+	gimplify_assign (v1, setup_call, &before_fork);
+	gimplify_assign (v2, init_call, &after_fork);
+	gimplify_assign (v3, fini_call, &before_join);
 	gimplify_assign (outgoing, teardown_call, &after_join);
       }
 
@@ -5915,9 +5985,6 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	}
     }
 
-  if (is_gimple_omp_oacc (ctx->stmt))
-    return;
-
   stmt = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ATOMIC_START),
 			    0);
   gimple_seq_add_stmt (stmt_seqp, stmt);
@@ -15804,7 +15871,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (!maybe_lookup_field (var, ctx))
 	  continue;
 
-	if (offloaded)
+	/* Don't remap oacc parallel reduction variables, because the
+	   intermediate result must be local to each gang.  */
+	if (offloaded && !is_oacc_parallel_reduction (var, ctx))
 	  {
 	    x = build_receiver_ref (var, true, ctx);
 	    tree new_var = lookup_decl (var, ctx);
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
index de97125..59cb6f4 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -1,70 +1,66 @@
-/* { dg-require-effective-target alloca } */
 /* Integer reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   int result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   result = 0;
-//   vresult = 0;
-// 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-//
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&:result)
+#pragma acc loop vector reduction (&:result)
   for (i = 0; i < n; i++)
     result &= array[i];
 
   /* '|' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (|:result)
+#pragma acc loop vector reduction (|:result)
   for (i = 0; i < n; i++)
     result |= array[i];
 
   /* '^' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (^:result)
+#pragma acc loop vector reduction (^:result)
   for (i = 0; i < n; i++)
     result ^= array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-2.c b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 2964236..4889241 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -1,49 +1,48 @@
-/* { dg-require-effective-target alloca } */
 /* float reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   float result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-3.c b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
index 34c51c2..b19224e2 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -1,49 +1,48 @@
-/* { dg-require-effective-target alloca } */
 /* double reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-4.c b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
index 328c0d4..88d7f70 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -1,51 +1,54 @@
-/* { dg-require-effective-target alloca } */
 /* complex reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   __complex__ double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
-  /* Needs support for complex multiplication.  */
-
-//   /* '*' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (*:result)
-//   for (i = 0; i < n; i++)
-//     result *= array[i];
-//
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+
+  /* 'max' reductions.  */
+#if 0
+  // error: 'result' has invalid type for 'reduction(max)'
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+#endif
+
+  /* 'min' reductions.  */
+#if 0
+  // error: 'result' has invalid type for 'reduction(min)'
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
+#endif
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (__real__(result) > __real__(array[i]));
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (__real__(result) > __real__(array[i]));
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
new file mode 100644
index 0000000..8341053
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
@@ -0,0 +1,202 @@
+int i;
+
+int main(void)
+{
+  int j, v;
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyin (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyout (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copy (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) create (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copy (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_create (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+
+#pragma acc data copyin (i, j)
+  {
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present (i, j)
+    {
+      if (i != -1 || j != -2)
+	abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+	abort ();
+      v = 1;
+    }
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+
+#pragma acc data copyin(i, j)
+  {
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v)
+    {
+      if (i != -1 || j != -2)
+	abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+	abort ();
+      v = 1;
+    }
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c
new file mode 100644
index 0000000..640d827
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c
@@ -0,0 +1,13 @@
+
+/* { dg-additional-options "-O2 -fopenacc-dim=16:16" } */
+
+#include "loop-default.h"
+#include <stdlib.h>
+
+int main ()
+{
+  /* Environment should be ignored.  */
+  setenv ("GOMP_OPENACC_DIM", "8:8",  1);
+  
+  return test_1 (16, 16, 32);
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h
new file mode 100644
index 0000000..55de04b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h
@@ -0,0 +1,125 @@
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <stdio.h>
+
+#pragma acc routine
+static int __attribute__ ((noinline)) coord ()
+{
+  int res = 0;
+
+  if (acc_on_device (acc_device_nvidia))
+    {
+      int g = 0, w = 0, v = 0;
+
+      __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+      __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+      __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+      res = (1 << 24) | (g << 16) | (w << 8) | v;
+    }
+  return res;
+}
+
+
+int check (const int *ary, int size, int gp, int wp, int vp)
+{
+  int exit = 0;
+  int ix;
+  int *gangs = (int *)alloca (gp * sizeof (int));
+  int *workers = (int *)alloca (wp * sizeof (int));
+  int *vectors = (int *)alloca (vp * sizeof (int));
+  int offloaded = 0;
+  
+  memset (gangs, 0, gp * sizeof (int));
+  memset (workers, 0, wp * sizeof (int));
+  memset (vectors, 0, vp * sizeof (int));
+  
+  for (ix = 0; ix < size; ix++)
+    {
+      int g = (ary[ix] >> 16) & 0xff;
+      int w = (ary[ix] >> 8) & 0xff;
+      int v = (ary[ix] >> 0) & 0xff;
+
+      if (g >= gp || w >= wp || v >= vp)
+	{
+	  printf ("unexpected cpu %#x used\n", ary[ix]);
+	  exit = 1;
+	}
+      else
+	{
+	  vectors[v]++;
+	  workers[w]++;
+	  gangs[g]++;
+	}
+      offloaded += ary[ix] >> 24;
+    }
+
+  if (!offloaded)
+    return 0;
+
+  if (offloaded != size)
+    {
+      printf ("offloaded %d times,  expected %d\n", offloaded, size);
+      return 1;
+    }
+
+  for (ix = 0; ix < gp; ix++)
+    if (gangs[ix] != gangs[0])
+      {
+	printf ("gang %d not used %d times\n", ix, gangs[0]);
+	exit = 1;
+      }
+  
+  for (ix = 0; ix < wp; ix++)
+    if (workers[ix] != workers[0])
+      {
+	printf ("worker %d not used %d times\n", ix, workers[0]);
+	exit = 1;
+      }
+  
+  for (ix = 0; ix < vp; ix++)
+    if (vectors[ix] != vectors[0])
+      {
+	printf ("vector %d not used %d times\n", ix, vectors[0]);
+	exit = 1;
+      }
+  
+  return exit;
+}
+
+#define N (32 *32*32)
+
+int test_1 (int gp, int wp, int vp)
+{
+  int ary[N];
+  int exit = 0;
+  
+#pragma acc parallel copyout (ary)
+  {
+#pragma acc loop gang (static:1)
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, gp, 1, 1);
+
+#pragma  acc parallel copyout (ary)
+  {
+#pragma acc loop worker
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, 1, wp, 1);
+
+#pragma  acc parallel copyout (ary)
+  {
+#pragma acc loop vector
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, 1, 1, vp);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
index 23c2a75..7afb89b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
index 1a93db3..db83692 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
index c14bddd..129a8c8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
index 706d0d8..fadf7d5 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
index a073ac8..68d3d7a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
new file mode 100644
index 0000000..55ab3c9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
@@ -0,0 +1,45 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, non-private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  res = hres = 1;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(*:res)
+    for (i = 0; i < 12; i++)
+      res *= arr[i];
+  }
+
+  for (i = 0; i < 12; i++)
+    hres *= arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
new file mode 100644
index 0000000..d4341e9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
new file mode 100644
index 0000000..2e5668b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs and workers, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
new file mode 100644
index 0000000..d610373
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
@@ -0,0 +1,28 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
new file mode 100644
index 0000000..ea5c151
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
@@ -0,0 +1,34 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+    /* "res" is non-private, and is not available until after the parallel
+       region.  */
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
new file mode 100644
index 0000000..0056f3c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
@@ -0,0 +1,33 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  double arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copyin(arr) copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
new file mode 100644
index 0000000..e69d0ec
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
@@ -0,0 +1,55 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, multiple
+   non-private reduction variables, float type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, mres = 0, hres = 0, hmres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res, mres)
+  {
+    #pragma acc loop gang reduction(+:res) reduction(max:mres)
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + i];
+	    if (arr[j * 1024 + i] > mres)
+	      mres = arr[j * 1024 + i];
+	  }
+
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + (1023 - i)];
+	    if (arr[j * 1024 + (1023 - i)] > mres)
+	      mres = arr[j * 1024 + (1023 - i)];
+	  }
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+	if (arr[j * 1024 + i] > hmres)
+	  hmres = arr[j * 1024 + i];
+	if (arr[j * 1024 + (1023 - i)] > hmres)
+	  hmres = arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+  assert (mres == hmres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
new file mode 100644
index 0000000..dd181ef
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
new file mode 100644
index 0000000..15f0053
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
@@ -0,0 +1,41 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (vector reduction in
+   gang-partitioned/worker-partitioned mode, private reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, k;
+  double ina[1024], inb[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      {
+        ina[j * 32 + i] = (i == j) ? 2.0 : 0.0;
+	inb[j * 32 + i] = (double) (i + j);
+      }
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(acc) copyin(ina, inb) copyout(out)
+  {
+    #pragma acc loop gang worker
+    for (k = 0; k < 32; k++)
+      for (j = 0; j < 32; j++)
+        {
+	  acc = 0;
+
+	  #pragma acc loop vector reduction(+:acc)
+	  for (i = 0; i < 32; i++)
+	    acc += ina[k * 32 + i] * inb[i * 32 + j];
+
+	  out[k * 32 + j] = acc;
+	}
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      assert (out[j * 32 + i] == (i + j) * 2);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
new file mode 100644
index 0000000..4864acd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
new file mode 100644
index 0000000..2765908
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
@@ -0,0 +1,41 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
new file mode 100644
index 0000000..c30b0e7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
@@ -0,0 +1,45 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = j;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 1023; i >= 0; i--)
+	  res += arr[j * 1024 + i];
+
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = j;
+      
+      for (i = 0; i < 1024; i++)
+	hres += arr[j * 1024 + i] * 2;
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
new file mode 100644
index 0000000..b5e28fb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable: gang-redundant mode).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i ^ 33;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyin(arr) copyout(out)
+  {
+    /* Private variables aren't initialized by default in openacc.  */
+    res = 0;
+
+    /* "res" should be available at the end of the following loop (and should
+       have the same value redundantly in each gang).  */
+    #pragma acc loop worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+    
+    #pragma acc loop gang (static: 1)
+    for (i = 0; i < 32; i++)
+      out[i] = res;
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  for (i = 0; i < 32; i++)
+    assert (out[i] == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
index 539e41d..28c6d0b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
new file mode 100644
index 0000000..5e82e1d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (worker and
+   vector-partitioned loops individually in gang-partitioned mode, int
+   type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) copy(res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 1023; i >= 0; i--)
+	  res += arr[j * 1024 + i];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      hres += arr[j * 1024 + i] * 2;
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
new file mode 100644
index 0000000..a7a75a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
@@ -0,0 +1,40 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   in gang-partitioned mode, int type with XOR).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       reduction(^:res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(^:res)
+	for (i = 0; i < 1024; i++)
+	  res ^= arr[j * 1024 + i];
+
+	#pragma acc loop worker vector reduction(^:res)
+	for (i = 0; i < 1024; i++)
+	  res ^= arr[j * 1024 + (1023 - i)];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres ^= arr[j * 1024 + i];
+	hres ^= arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
new file mode 100644
index 0000000..8d85fed
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
@@ -0,0 +1,42 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   together in gang-partitioned mode, float type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) copy(res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + (1023 - i)];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
new file mode 100644
index 0000000..1904b4a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
@@ -0,0 +1,55 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   together in gang-partitioned mode, float type, multiple reductions).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, mres = 0, hres = 0, hmres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) reduction(max:mres) copy(res, mres)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + i];
+	    if (arr[j * 1024 + i] > mres)
+	      mres = arr[j * 1024 + i];
+	  }
+
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + (1023 - i)];
+	    if (arr[j * 1024 + (1023 - i)] > mres)
+	      mres = arr[j * 1024 + (1023 - i)];
+	  }
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+	if (arr[j * 1024 + i] > hmres)
+	  hmres = arr[j * 1024 + i];
+	if (arr[j * 1024 + (1023 - i)] > hmres)
+	  hmres = arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+  assert (mres == hmres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
index dceac39..a88b60f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
@@ -1,40 +1,54 @@
+/* { dg-additional-options "-w" } */
+
 #include <assert.h>
 
+/* Test of reduction on parallel directive.  */
+
+#define ACTUAL_GANGS 256
+
 int
 main (int argc, char *argv[])
 {
-  int res, res2 = 0;
+  int res, res1 = 0, res2 = 0;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 256
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(+:res1) copy(res2, res1)
   {
+    res1 += 5;
+
     #pragma acc atomic
     res2 += 5;
   }
   res = GANGS * 5;
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
-  res = res2 = 1;
+  res = res1 = res2 = 1;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 8
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(*:res1) copy(res1, res2)
   {
+    res1 *= 5;
+
     #pragma acc atomic
     res2 *= 5;
   }
   for (int i = 0; i < GANGS; ++i)
     res *= 5;
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
index bd5715c..911b76c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
@@ -1,18 +1,25 @@
+/* { dg-additional-options "-w" } */
+
 #include <assert.h>
 #include <openacc.h>
 
+/* Test of reduction on parallel directive (with async).  */
+
 int
 main (int argc, char *argv[])
 {
-  int res, res2 = 0;
+  int res, res1 = 0, res2 = 0;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 256
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2) async(1)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(+:res1) copy(res1, res2) async(1)
   {
+    res1 += 5;
+
     #pragma acc atomic
     res2 += 5;
   }
@@ -20,18 +27,22 @@ main (int argc, char *argv[])
 
   acc_wait (1);
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
-  res = res2 = 1;
+  res = res1 = res2 = 1;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 8
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2) async(1)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(*:res1) copy(res1, res2) async(1)
   {
+    res1 *= 5;
+
     #pragma acc atomic
     res2 *= 5;
   }
@@ -40,6 +51,7 @@ main (int argc, char *argv[])
 
   acc_wait (1);
 
+  assert (res == res1);
   assert (res == res2);
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
index fd9df33..9a411fe 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
@@ -2,205 +2,5 @@
 
 #include <stdlib.h>
 
-int i;
-
-int main(void)
-{
-  int j, v;
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-
-#pragma acc data copyin (i, j)
-  {
-#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j)
-    {
-      if (i != -1 || j != -2)
-        abort ();
-      i = 2;
-      j = 1;
-      if (i != 2 || j != 1)
-        abort ();
-      v = 1;
-    }
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-
-#pragma acc data copyin(i, j)
-  {
-#pragma acc parallel /* copyout */ present_or_copyout (v)
-    {
-      if (i != -1 || j != -2)
-        abort ();
-      i = 2;
-      j = 1;
-      if (i != 2 || j != 1)
-        abort ();
-      v = 1;
-    }
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  return 0;
-}
+#define EXEC_DIRECTIVE parallel
+#include "data-clauses.h"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index 39357ce..ecf78c7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -5,12 +5,20 @@
 
 int main ()
 {
+  int dummy[10];
+  
 #pragma acc parallel num_workers (2<<20) /* { dg-error "using num_workers" } */
   {
+#pragma acc loop worker
+    for (int  i = 0; i < 10; i++)
+      dummy[i] = i;
   }
 
 #pragma acc parallel vector_length (2<<20) /* { dg-error "using vector_length" } */
   {
+#pragma acc loop vector
+    for (int  i = 0; i < 10; i++)
+      dummy[i] = i;
   }
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
new file mode 100644
index 0000000..b2c60e5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+#define N 10
+
+int
+main ()
+{
+  int s1 = 0, s2 = 0;
+  int i;
+  int dummy = 0;
+
+#pragma acc data copy (dummy)
+  {
+#pragma acc parallel num_gangs (N) reduction (+:s1) copy(s1)
+    {
+      s1++;
+    }
+  }
+
+  if (acc_get_device_type () != acc_device_nvidia)
+    {
+      if (s1 != 1)
+	abort ();
+    }
+  else
+    {
+      if (s1 != N)
+	abort ();
+    }
+
+  s1 = 0;
+  s2 = 0;
+
+#pragma acc parallel num_gangs (10) reduction (+:s1, s2) copy(s1, s2)
+  {
+    s1++;
+    s2 += N;
+  }
+
+  if (acc_get_device_type () != acc_device_nvidia)
+    {
+      if (s1 != 1)
+	abort ();
+      if (s2 != N)
+	abort ();
+    }
+  else
+    {
+      if (s1 != N)
+	abort ();
+      if (s2 != N*N)
+	abort ();
+    }
+
+  s1 = 0;
+
+#pragma acc parallel num_gangs (10) reduction (+:s1) copy(s1)
+  {
+#pragma acc loop gang reduction (+:s1)
+    for (i = 0; i < 10; i++)
+      s1++;
+  }
+
+  if (s1 != N)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
index e557931..10eb278 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
@@ -1,46 +1,59 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* Integer reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-
-#define vl 32
-
-#define DO_PRAGMA(x) _Pragma (#x)
-
-#define check_reduction_op(type, op, init, b)	\
-  {						\
-    type res, vres;				\
-    res = (init);				\
-    DO_PRAGMA (acc parallel vector_length (vl) copy(res))	\
-DO_PRAGMA (acc loop reduction (op:res))\
-    for (i = 0; i < n; i++)			\
-      res = res op (b);				\
-						\
-    vres = (init);				\
-    for (i = 0; i < n; i++)			\
-      vres = vres op (b);			\
-						\
-    if (res != vres)				\
-      abort ();					\
-  }
+#include "reduction.h"
+
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
 static void
-test_reductions_int (void)
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 100;
   int i;
   int array[n];
 
   for (i = 0; i < n; i++)
-    array[i] = i;
-
-  check_reduction_op (int, +, 0, array[i]);
-  check_reduction_op (int, *, 1, array[i]);
-  check_reduction_op (int, &, -1, array[i]);
-  check_reduction_op (int, |, 0, array[i]);
-  check_reduction_op (int, ^, 0, array[i]);
+    array[i] = i+1;
+
+  /* Gang reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, *, 1, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, &, -1, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, |, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, ^, 0, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (int, *, 1, array[i], num_workers (nw), worker);
+  check_reduction_op (int, &, -1, array[i], num_workers (nw), worker);
+  check_reduction_op (int, |, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (int, ^, 0, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (int, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (int, *, 1, array[i], vector_length (vl), vector);
+  check_reduction_op (int, &, -1, array[i], vector_length (vl), vector);
+  check_reduction_op (int, |, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (int, ^, 0, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, *, 1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, &, -1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, |, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, ^, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
 }
 
 static void
@@ -55,29 +68,31 @@ test_reductions_bool (void)
     array[i] = i;
 
   cmp_val = 5;
-  check_reduction_op (bool, &&, true, (cmp_val > array[i]));
-  check_reduction_op (bool, ||, false, (cmp_val > array[i]));
-}
 
-#define check_reduction_macro(type, op, init, b)	\
-  {							\
-    type res, vres;					\
-    res = (init);					\
-DO_PRAGMA (acc parallel vector_length (vl) copy(res))\
-DO_PRAGMA (acc loop reduction (op:res))\
-    for (i = 0; i < n; i++)				\
-      res = op (res, (b));				\
-							\
-    vres = (init);					\
-    for (i = 0; i < n; i++)				\
-      vres = op (vres, (b));				\
-							\
-    if (res != vres)					\
-      abort ();						\
-  }
-
-#define max(a, b) (((a) > (b)) ? (a) : (b))
-#define min(a, b) (((a) < (b)) ? (a) : (b))
+  /* Gang reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_gangs (ng),
+		      gang);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_gangs (ng),
+		      gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_workers (nw),
+		      worker);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_workers (nw),
+		      worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), vector_length (vl),
+		      vector);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), vector_length (vl),
+		      vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_gangs (ng)
+		      num_workers (nw) vector_length (vl), gang worker vector);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_gangs (ng)
+		      num_workers (nw) vector_length (vl), gang worker vector);
+}
 
 static void
 test_reductions_minmax (void)
@@ -89,14 +104,32 @@ test_reductions_minmax (void)
   for (i = 0; i < n; i++)
     array[i] = i;
 
-  check_reduction_macro (int, min, n + 1, array[i]);
-  check_reduction_macro (int, max, -1, array[i]);
+  /* Gang reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (int, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_workers (nw), worker);
+  check_reduction_macro (int, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (int, max, -1, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (int, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
 }
 
 int
 main (void)
 {
-  test_reductions_int ();
+  test_reductions ();
   test_reductions_bool ();
   test_reductions_minmax ();
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
index 8a0b0d6..7cb9497 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
@@ -1,124 +1,83 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* float reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 100;
   int i;
-  float vresult, result, array[n];
-  bool lvresult, lresult;
+  float array[n];
 
   for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
+    array[i] = i+1;
 
-  /* '+' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (+:result)
-  for (i = 0; i < n; i++)
-    result += array[i];
+  /* Gang reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (float, *, 1, array[i], num_gangs (ng), gang);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult += array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* '*' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (*:result)
-  for (i = 0; i < n; i++)
-    result *= array[i];
+  /* Worker reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (float, *, 1, array[i], num_workers (nw), worker);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult *= array[i];
-
-  if (fabs(result - vresult) > .0001)
-    abort ();
-  result = 0;
-  vresult = 0;
-
-  /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (max:result)
-  for (i = 0; i < n; i++)
-    result = result > array[i] ? result : array[i];
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult > array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (min:result)
-  for (i = 0; i < n; i++)
-    result = result < array[i] ? result : array[i];
+  /* Vector reductions.  */
+  check_reduction_op (float, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (float, *, 1, array[i], vector_length (vl), vector);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult < array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
-
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (result > array[i]);
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (result > array[i]);
-
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
+  /* Combined reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (float, *, 1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+}
 
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (result > array[i]);
+static void
+test_reductions_minmax (void)
+{
+  const int n = 1000;
+  int i;
+  float array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (result > array[i]);
+    array[i] = i;
 
-  if (lresult != lvresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (float, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_workers (nw),
+			 worker);
+  check_reduction_macro (float, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (float, max, -1, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (float, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw)vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
+  test_reductions_minmax ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
index a233e29..1b948be 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
@@ -1,93 +1,84 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* double reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 10;
   int i;
-  double vresult, result, array[n];
-  bool lvresult, lresult;
-
-  for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
-
-  /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (max:result)
-  for (i = 0; i < n; i++)
-    result = result > array[i] ? result : array[i];
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult > array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (min:result)
-  for (i = 0; i < n; i++)
-    result = result < array[i] ? result : array[i];
+  double array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    vresult = vresult < array[i] ? vresult : array[i];
+    array[i] = i+1;
 
-  if (result != vresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (double, *, 1, array[i], num_gangs (ng), gang);
 
-  result = 5;
-  vresult = 5;
+  /* Worker reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (double, *, 1, array[i], num_workers (nw), worker);
 
-  lresult = false;
-  lvresult = false;
+  /* Vector reductions.  */
+  check_reduction_op (double, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (double, *, 1, array[i], vector_length (vl), vector);
 
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (result > array[i]);
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (result > array[i]);
-
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
+  /* Combined reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_gangs (ng)  num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (double, *, 1, array[i], num_gangs (ng)  num_workers (nw)
+		      vector_length (vl), gang worker vector);
+}
 
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (result > array[i]);
+static void
+test_reductions_minmax (void)
+{
+  const int n = 1000;
+  int i;
+  double array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (result > array[i]);
+    array[i] = i;
 
-  if (lresult != lvresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (double, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_workers (nw),
+			 worker);
+  check_reduction_macro (double, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (double, max, -1, array[i], vector_length (vl),
+			 vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (double, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
+  test_reductions_minmax ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
index 59d49c1..79355ed 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
@@ -1,59 +1,56 @@
 /* { dg-do run { target { ! { hppa*-*-hpux* } } } } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* complex reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
 #include <complex.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 10;
   int i;
-  double _Complex vresult, result, array[n];
-  bool lvresult, lresult;
-
-  for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
-
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (creal(result) > creal(array[i]));
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (creal(result) > creal(array[i]));
+  double _Complex array[n];
 
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
-
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (creal(result) > creal(array[i]));
-
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (creal(result) > creal(array[i]));
-
-  if (lresult != lvresult)
-    abort ();
+    array[i] = i+1;
+
+  /* Gang reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_gangs (ng), gang);
+  check_reduction_op (double, *, 1, creal (array[i]), num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_workers (nw),
+		      worker);
+  check_reduction_op (double, *, 1, creal (array[i]), num_workers (nw),
+		      worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), vector_length (vl),
+		      vector);
+  check_reduction_op (double, *, 1, creal (array[i]), vector_length (vl),
+		      vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_op (double, *, 1, creal (array[i]), num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
index efe8702..46b553a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
@@ -1,32 +1,57 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
+/* Multiple reductions.  */
+
 #include <stdio.h>
 #include <stdlib.h>
 
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
+
+const int n = 100;
+
+#define DO_PRAGMA(x) _Pragma (#x)
+
+#define check_reduction(gwv_par, gwv_loop)		\
+  {							\
+  s1 = 2; s2 = 5;					\
+DO_PRAGMA (acc parallel gwv_par copy (s1, s2))		\
+DO_PRAGMA (acc loop gwv_loop reduction (+:s1, s2))	\
+    for (i = 0; i < n; i++)				\
+      {							\
+         s1 = s1 + 3;					\
+         s2 = s2 + 5;					\
+      }							\
+							\
+    if (s1 != v1 && s2 != v2)				\
+      abort ();						\
+  }
+
 int
 main (void)
 {
   int s1 = 2, s2 = 5, v1 = 2, v2 = 5;
-  int n = 100;
   int i;
 
-#pragma acc parallel vector_length (32) copy(s1,s2)
-#pragma acc loop reduction (+:s1, s2)
-  for (i = 0; i < n; i++)
-    {
-      s1 = s1 + 3;
-      s2 = s2 + 2;
-    }
-
   for (i = 0; i < n; i++)
     {
       v1 = v1 + 3;
       v2 = v2 + 2;
     }
-  
-  if (s1 != v1)
-    abort ();
-  
-  if (s2 != v2)
-    abort ();
-    
+
+  check_reduction (num_gangs (ng), gang);
+
+  /* Nvptx targets require a vector_length or 32 in to allow spinlocks with
+     gangs.  */
+  check_reduction (num_workers (nw) vector_length (vl), worker);
+  check_reduction (vector_length (vl), vector);
+  check_reduction (num_gangs (ng) num_workers (nw) vector_length (vl), gang
+		   worker vector);
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
new file mode 100644
index 0000000..af30b31
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+/* Test reductions on explicitly private variables.  */
+
+#include <assert.h>
+
+int
+main ()
+{
+  int i, j, red[10];
+  int v;
+
+  for (i = 0; i < 10; i++)
+    red[i] = -1;
+
+#pragma acc parallel copyout(red)
+  {
+#pragma acc loop gang private(v)
+    for (j = 0; j < 10; j++)
+      {
+	v = j;
+
+#pragma acc loop vector reduction (+:v)
+	for (i = 0; i < 100; i++)
+	  v++;
+
+	red[j] = v;
+      }
+  }
+
+  for (i = 0; i < 10; i++)
+    assert (red[i] == i + 100);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
new file mode 100644
index 0000000..1b3f8d4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
@@ -0,0 +1,43 @@
+#ifndef REDUCTION_H
+#define REDUCTION_H
+
+#define DO_PRAGMA(x) _Pragma (#x)
+
+#define check_reduction_op(type, op, init, b, gwv_par, gwv_loop)	\
+  {									\
+    type res, vres;							\
+    res = (init);							\
+DO_PRAGMA (acc parallel gwv_par copy (res))				\
+DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
+    for (i = 0; i < n; i++)						\
+      res = res op (b);							\
+									\
+    vres = (init);							\
+    for (i = 0; i < n; i++)						\
+      vres = vres op (b);						\
+									\
+    if (res != vres)							\
+      abort ();								\
+  }
+
+#define check_reduction_macro(type, op, init, b, gwv_par, gwv_loop)	\
+  {									\
+    type res, vres;							\
+    res = (init);							\
+    DO_PRAGMA (acc parallel gwv_par copy(res))				\
+DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
+    for (i = 0; i < n; i++)						\
+      res = op (res, (b));						\
+									\
+    vres = (init);							\
+    for (i = 0; i < n; i++)						\
+      vres = op (vres, (b));						\
+									\
+    if (res != vres)							\
+      abort ();								\
+  }
+
+#define max(a, b) (((a) > (b)) ? (a) : (b))
+#define min(a, b) (((a) < (b)) ? (a) : (b))
+
+#endif
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90
new file mode 100644
index 0000000..4c86ada
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90
@@ -0,0 +1,75 @@
+! Exercise the auto, independent, seq and tile loop clauses inside
+! parallel regions. 
+
+program loops
+  integer, parameter     :: n = 20, c = 10
+  integer                :: i, a(n), b(n)
+
+  a(:) = 0
+  b(:) = 0
+
+  ! COPY
+
+  !$acc parallel copy (a)
+  !$acc loop auto
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     b(i) = i
+  end do
+
+  call check (a, b, n)
+
+  ! COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel copyout (a)
+  !$acc loop independent
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+  call check (a, b, n)
+
+  ! COPYIN
+
+  a(:) = 0
+
+  !$acc parallel copyout (a) copyin (b)
+  !$acc loop seq
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPY
+
+  !$acc parallel pcopy (a)
+  !$acc loop tile (*)
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+end program loops
+
+subroutine check (a, b, n)
+  integer :: n, a(n), b(n)
+  integer :: i
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+end subroutine check
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
new file mode 100644
index 0000000..f49ed73
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+! { dg-additional-options "-w" }
+
+program reduction
+  integer, parameter :: n = 10
+  integer sum
+  include "openacc_lib.h"
+
+  sum = 0
+
+  !$acc parallel reduction(+:sum) num_gangs (n) copy(sum)
+  sum = sum + 1
+  !$acc end parallel
+
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     if (sum .ne. n) call abort
+  else
+     if (sum .ne. 1) call abort
+  end if
+
+  ! Test reductions inside subroutines
+
+  sum = 0
+  call redsub (sum, n)
+
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     if (sum .ne. n) call abort
+  else
+     if (sum .ne. 1) call abort
+  end if
+end program reduction
+
+subroutine redsub(sum, n)
+  integer :: sum, n
+
+  !$acc parallel reduction(+:sum) num_gangs (10)  copy(sum)
+  sum = sum + 1
+  !$acc end parallel
+end subroutine redsub
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
index db0a52d..e51509f 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
@@ -1,28 +1,55 @@
 ! { dg-do run }
+! { dg-additional-options "-w" }
 
 ! Integer reductions
 
 program reduction_1
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
-  integer               :: i, vresult, result
-  logical               :: lresult, lvresult
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
+  integer               :: i, vresult, rg, rw, rv, rc
+  logical               :: lrg, lrw, lrv, lrc, lvresult
   integer, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -31,17 +58,46 @@ program reduction_1
      vresult = vresult + array(i)
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 0
-  vresult = 0
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! '*' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
+  vresult = 1
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
   do i = 1, n
-     result = result * array(i)
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -50,17 +106,46 @@ program reduction_1
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
+  do i = 1, n
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel num_gangs(ng) Num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
   do i = 1, n
-     result = max (result, array(i))
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -69,17 +154,46 @@ program reduction_1
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! 'min' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
   do i = 1, n
-     result = min (result, array(i))
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
+  do i = 1, n
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -88,17 +202,46 @@ program reduction_1
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! 'iand' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! 'iand' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(iand:rg) gang
+  do i = 1, n
+     rg = iand (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(iand:rw) worker
+  do i = 1, n
+     rw = iand (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(iand:rv) vector
+  do i = 1, n
+     rv = iand (rv, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(iand:result)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(iand:rc) gang worker vector
   do i = 1, n
-     result = iand (result, array(i))
+     rc = iand (rc, array(i))
   end do
   !$acc end parallel
 
@@ -107,17 +250,46 @@ program reduction_1
      vresult = iand (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! 'ior' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(ior:rg) gang
+  do i = 1, n
+     rg = ior (rg, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(ior:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(ior:rw) worker
   do i = 1, n
-     result = ior (result, array(i))
+     rw = ior (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(ior:rv) gang
+  do i = 1, n
+     rv = ior (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(ior:rc) gang worker vector
+  do i = 1, n
+     rc = ior (rc, array(i))
   end do
   !$acc end parallel
 
@@ -126,17 +298,46 @@ program reduction_1
      vresult = ior (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
-  result = 0
+  !
+  ! 'ieor' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'ieor' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(ieor:rg) gang
+  do i = 1, n
+     rg = ieor (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(ieor:rw) worker
+  do i = 1, n
+     rw = ieor (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(ieor:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(ieor:rv) vector
   do i = 1, n
-     result = ieor (result, array(i))
+     rv = ieor (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(ieor:rc) gang worker vector
+  do i = 1, n
+     rc = ieor (rc, array(i))
   end do
   !$acc end parallel
 
@@ -145,17 +346,46 @@ program reduction_1
      vresult = ieor (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
+  do i = 1, n
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -164,17 +394,46 @@ program reduction_1
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
+
+  !
+  ! '.or.' reductions
+  !
 
-  lresult = .false.
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
+  do i = 1, n
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -183,17 +442,46 @@ program reduction_1
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -202,17 +490,46 @@ program reduction_1
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -221,5 +538,8 @@ program reduction_1
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_1
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
index 3d99668..cd09099 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
@@ -5,26 +5,52 @@
 program reduction_2
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  real, parameter       :: e = .001
-  real                  :: vresult, result
-  logical               :: lresult, lvresult
-  real, dimension (n) :: array
+  real                  :: vresult, rg, rw, rv, rc
+  real, parameter       :: e = 0.001
+  logical               :: lrg, lrw, lrv, lrc, lvresult
+  real, dimension (n)   :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -33,17 +59,46 @@ program reduction_2
      vresult = vresult + array(i)
   end do
 
-  if (abs (result - vresult) .ge. e) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! '*' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! '*' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
   do i = 1, n
-     result = result * array(i)
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -52,17 +107,46 @@ program reduction_2
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rg = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
   do i = 1, n
-     result = max (result, array(i))
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -71,17 +155,46 @@ program reduction_2
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! 'min' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
   do i = 1, n
-     result = min (result, array(i))
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -90,17 +203,46 @@ program reduction_2
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -109,17 +251,46 @@ program reduction_2
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
-  lresult = .false.
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .false.
+  lrw = .false.
+  lrv = .false.
+  lrc = .false.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -128,17 +299,46 @@ program reduction_2
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -147,17 +347,46 @@ program reduction_2
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -166,5 +395,8 @@ program reduction_2
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_2
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
index d0b590e..a7dbf2b 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
@@ -5,26 +5,52 @@
 program reduction_3
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  double precision, parameter :: e = .001
-  double precision      :: vresult, result
-  logical               :: lresult, lvresult
+  double precision      :: vresult, rg, rw, rv, rc
+  double precision, parameter :: e = 0.001
+  logical               :: lrg, lrw, lrv, lrc, lvresult
   double precision, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -33,17 +59,46 @@ program reduction_3
      vresult = vresult + array(i)
   end do
 
-  if (abs (result - vresult) .ge. e) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! '*' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! '*' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
   do i = 1, n
-     result = result * array(i)
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -52,17 +107,46 @@ program reduction_3
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
   do i = 1, n
-     result = max (result, array(i))
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -71,17 +155,46 @@ program reduction_3
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! 'min' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
   do i = 1, n
-     result = min (result, array(i))
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -90,17 +203,46 @@ program reduction_3
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -109,17 +251,46 @@ program reduction_3
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
-  lresult = .false.
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .false.
+  lrw = .false.
+  lrv = .false.
+  lrc = .false.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -128,17 +299,46 @@ program reduction_3
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -147,17 +347,46 @@ program reduction_3
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -166,5 +395,8 @@ program reduction_3
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_3
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
index 8c99fdb..c3bdaf6 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
@@ -5,50 +5,108 @@
 program reduction_4
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  complex               :: vresult, result
+  real                  :: vresult, rg, rw, rv, rc
   complex, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
+  do i = 1, n
+     rw = rw + REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + REAL(array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
   do i = 1, n
-     result = result + array(i)
+     rc = rc + REAL(array(i))
   end do
   !$acc end parallel
 
   ! Verify the results
   do i = 1, n
-     vresult = vresult + array(i)
+     vresult = vresult + REAL(array(i))
   end do
 
-  if (result .ne. vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
-  result = 1
+  !
+  ! '*' reductions
+  !
+
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-!  ! '*' reductions
-!
-!  !$acc parallel vector_length(vl)
-!  !$acc loop reduction(*:result)
-!  do i = 1, n
-!     result = result * array(i)
-!  end do
-!  !$acc end parallel
-!
-!  ! Verify the results
-!  do i = 1, n
-!     vresult = vresult * array(i)
-!  end do
-!
-!  if (result.ne.vresult) call abort
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+     rw = rw * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+     vresult = vresult * REAL(array(i))
+  end do
+
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 end program reduction_4
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
index ec13e4e..4210648 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
@@ -1,12 +1,17 @@
 ! { dg-do run }
+! { dg-additional-options "-w" }
 
 ! subroutine reduction
 
 program reduction
   integer, parameter    :: n = 40, c = 10
-  integer               :: i, vsum, sum
+  integer               :: i, vsum, gs, ws, vs, cs, ns
 
-  call redsub (sum, n, c)
+  call redsub_gang (gs, n, c)
+  call redsub_worker (ws, n, c)
+  call redsub_vector (vs, n, c)
+  call redsub_combined (cs, n, c)
+  call redsub_nested (ns, n, c)
 
   vsum = 0
 
@@ -15,21 +20,80 @@ program reduction
      vsum = vsum + c
   end do
 
-  if (sum.ne.vsum) call abort ()
+  if (gs .ne. vsum) call abort ()
+  if (ws .ne. vsum) call abort ()
+  if (vs .ne. vsum) call abort ()
+  if (cs .ne. vsum) call abort ()
+  if (ns .ne. vsum) call abort ()
 end program reduction
 
-subroutine redsub(sum, n, c)
+subroutine redsub_gang(sum, n, c)
   integer :: sum, n, c
 
-  integer :: s
-  s = 0
+  sum = 0
 
-  !$acc parallel vector_length(32) copyin (n, c) copy (s) num_gangs(1)
-  !$acc loop reduction(+:s)
+  !$acc parallel copyin (n, c) num_gangs(n) copy(sum)
+  !$acc loop reduction(+:sum) gang
   do i = 1, n
-     s = s + c
+     sum = sum + c
   end do
   !$acc end parallel
+end subroutine redsub_gang
 
-  sum = s
-end subroutine redsub
+subroutine redsub_worker(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel copyin (n, c) num_workers(4) vector_length (32) copy(sum)
+  !$acc loop reduction(+:sum) worker
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_worker
+
+subroutine redsub_vector(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel copyin (n, c) vector_length(32) copy(sum)
+  !$acc loop reduction(+:sum) vector
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_vector
+
+subroutine redsub_combined(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel num_gangs (8) num_workers (4) vector_length(32) copy(sum)
+  !$acc loop reduction(+:sum) gang worker vector
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_combined
+
+subroutine redsub_nested(sum, n, c)
+  integer :: sum, n, c
+  integer :: ii, jj
+
+  ii = n / 10;
+  jj = 10;
+  sum = 0
+
+  !$acc parallel num_gangs (8) copy(sum)
+  !$acc loop reduction(+:sum) gang
+  do i = 1, ii
+     !$acc loop reduction(+:sum) vector
+     do j = 1, jj
+        sum = sum + c
+     end do
+  end do
+  !$acc end parallel
+end subroutine redsub_nested
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
index 2ff6f5f..f3ed275 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
@@ -1,30 +1,94 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp -w" }
 
 program reduction
   implicit none
 
-  integer, parameter    :: n = 100
-  integer               :: i, s1, s2, vs1, vs2
+  integer, parameter    :: n = 100, n2 = 1000, chunksize = 10
+  integer               :: i, gs1, gs2, ws1, ws2, vs1, vs2, cs1, cs2, hs1, hs2
+  integer               :: j, red, vred
 
-  s1 = 0
-  s2 = 0
+  gs1 = 0
+  gs2 = 0
+  ws1 = 0
+  ws2 = 0
   vs1 = 0
   vs2 = 0
+  cs1 = 0
+  cs2 = 0
+  hs1 = 0
+  hs2 = 0
 
-  !$acc parallel vector_length (32) copy(s1, s2)
-  !$acc loop reduction(+:s1, s2)
+  !$acc parallel num_gangs (1000) copy(gs1, gs2)
+  !$acc loop reduction(+:gs1, gs2) gang
   do i = 1, n
-     s1 = s1 + 1
-     s2 = s2 + 2
+     gs1 = gs1 + 1
+     gs2 = gs2 + 2
   end do
   !$acc end parallel
 
-  ! Verify the results
+  !$acc parallel num_workers (4) vector_length (32) copy(ws1, ws2)
+  !$acc loop reduction(+:ws1, ws2) worker
+  do i = 1, n
+     ws1 = ws1 + 1
+     ws2 = ws2 + 2
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length (32) copy(vs1, vs2)
+  !$acc loop reduction(+:vs1, vs2) vector
   do i = 1, n
      vs1 = vs1 + 1
      vs2 = vs2 + 2
   end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(8) num_workers(4) vector_length(32) copy(cs1, cs2)
+  !$acc loop reduction(+:cs1, cs2) gang worker vector
+  do i = 1, n
+     cs1 = cs1 + 1
+     cs2 = cs2 + 2
+  end do
+  !$acc end parallel
+
+  ! Verify the results on the host
+  do i = 1, n
+     hs1 = hs1 + 1
+     hs2 = hs2 + 2
+  end do
+
+  if (gs1 .ne. hs1) call abort ()
+  if (gs2 .ne. hs2) call abort ()
+
+  if (ws1 .ne. hs1) call abort ()
+  if (ws2 .ne. hs2) call abort ()
+
+  if (vs1 .ne. hs1) call abort ()
+  if (vs2 .ne. hs2) call abort ()
+
+  if (cs1 .ne. hs1) call abort ()
+  if (cs2 .ne. hs2) call abort ()
+
+  ! Nested reductions.
+
+  red = 0
+  vred = 0
+
+  !$acc parallel num_gangs(10) vector_length(32) copy(red)
+  !$acc loop reduction(+:red) gang
+  do i = 1, n/chunksize
+     !$acc loop reduction(+:red) vector
+     do j = 1, chunksize
+        red = red + chunksize
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 1, n/chunksize
+     do j = 1, chunksize
+        vred = vred + chunksize
+     end do
+  end do
 
-  if (s1.ne.vs1) call abort ()
-  if (s2.ne.vs2) call abort ()
+  if (red .ne. vred) call abort ()
 end program reduction
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
new file mode 100644
index 0000000..8ec36ad
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
@@ -0,0 +1,88 @@
+! { dg-do run }
+! { dg-additional-options "-w" }
+
+! subroutine reduction with private and firstprivate variables
+
+program reduction
+  integer, parameter    :: n = 100
+  integer               :: i, j, vsum, cs, arr(n)
+
+  call redsub_private (cs, n, arr)
+  call redsub_bogus (cs, n)
+  call redsub_combined (cs, n, arr)
+
+  vsum = 0
+
+  ! Verify the results
+  do i = 1, n
+     vsum = i
+     do j = 1, n
+        vsum = vsum + 1;
+     end do
+     if (vsum .ne. arr(i)) call abort ()
+  end do
+end program reduction
+
+! This subroutine tests a reduction with an explicit private variable.
+
+subroutine redsub_private(sum, n, arr)
+  integer :: sum, n, arr(n)
+  integer :: i, j, v
+
+  !$acc parallel copyout (arr)
+  !$acc loop gang private (v)
+  do j = 1, n
+     v = j
+
+     !$acc loop vector reduction (+:v)
+     do i = 1, 100
+        v = v + 1
+     end do
+
+     arr(j) = v
+  end do
+  !$acc end parallel
+
+  ! verify the results
+  do i = 1, 10
+     if (arr(i) .ne. 100+i) call abort ()
+  end do
+end subroutine redsub_private
+
+
+! Bogus reduction on an impliclitly firstprivate variable.  The results do
+! survive the parallel region.  The goal here is to ensure that gfortran
+! doesn't ICE.
+
+subroutine redsub_bogus(sum, n)
+  integer :: sum, n, arr(n)
+  integer :: i
+
+  !$acc parallel
+  !$acc loop gang worker vector reduction (+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+end subroutine redsub_bogus
+
+! This reduction involving a firstprivate variable yields legitimate results.
+
+subroutine redsub_combined(sum, n, arr)
+  integer :: sum, n, arr(n)
+  integer :: i, j
+
+  !$acc parallel copy (arr)
+  !$acc loop gang
+  do i = 1, n
+     sum = i;
+
+     !$acc loop reduction(+:sum)
+     do j = 1, n
+        sum = sum + 1
+     end do
+
+     arr(i) = sum
+  end do
+  !$acc end parallel
+end subroutine redsub_combined

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-02-09 15:14 openacc reference reductions Cesar Philippidis
@ 2016-02-09 15:33 ` Nathan Sidwell
  2016-02-09 16:17   ` Cesar Philippidis
  2021-04-26 10:35 ` [OpenACC] Don't compile libgomp testcases with '-w' (was: openacc reference reductions) Thomas Schwinge
  1 sibling, 1 reply; 19+ messages in thread
From: Nathan Sidwell @ 2016-02-09 15:33 UTC (permalink / raw)
  To: Cesar Philippidis, gcc-patches, Jakub Jelinek, Nathan Sidwell

While I've not looked at the rest of the patch, this bit stood out:

> +static bool
> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
> +{
> +  if (!is_oacc_parallel (ctx))
> +    return false;
> +
> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
> +
> +  /* Don't install a local copy of the decl if it used
> +     inside a acc parallel reduction.  */

^^ comment is misleading -- this routine's not installing anything

> +  if (is_oacc_parallel (ctx))

^^ already checked above.

> +    for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> +	  && OMP_CLAUSE_DECL (c) == var)
> +	return true;
> +
> +  return false;
> +}
> +


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-02-09 15:33 ` Nathan Sidwell
@ 2016-02-09 16:17   ` Cesar Philippidis
  2016-02-22 15:34     ` Cesar Philippidis
  0 siblings, 1 reply; 19+ messages in thread
From: Cesar Philippidis @ 2016-02-09 16:17 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches, Jakub Jelinek, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

On 02/09/2016 07:33 AM, Nathan Sidwell wrote:
> While I've not looked at the rest of the patch, this bit stood out:
> 
>> +static bool
>> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
>> +{
>> +  if (!is_oacc_parallel (ctx))
>> +    return false;
>> +
>> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
>> +
>> +  /* Don't install a local copy of the decl if it used
>> +     inside a acc parallel reduction.  */
> 
> ^^ comment is misleading -- this routine's not installing anything
> 
>> +  if (is_oacc_parallel (ctx))
> 
> ^^ already checked above.
> 
>> +    for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>> +      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>> +      && OMP_CLAUSE_DECL (c) == var)
>> +    return true;
>> +
>> +  return false;
>> +}
>> +

Thanks for catching that. Those are artifacts from when this code used
to be located exclusively in scan_sharing_clauses. I've updated the
patch with those changes.

Cesar


[-- Attachment #2: trunk-reductions-20160209.diff --]
[-- Type: text/x-patch, Size: 132984 bytes --]

2016-02-09  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d41688b..592e64b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -308,6 +308,25 @@ is_oacc_kernels (omp_context *ctx)
 	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
 }
 
+/* Return true if CTX corresponds to an oacc parallel region and if
+   VAR is used in a reduction.  */
+
+static bool
+is_oacc_parallel_reduction (tree var, omp_context *ctx)
+{
+  if (!is_oacc_parallel (ctx))
+    return false;
+
+  tree clauses = gimple_omp_target_clauses (ctx->stmt);
+
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+	&& OMP_CLAUSE_DECL (c) == var)
+      return true;
+
+  return false;
+}
+
 /* If DECL is the artificial dummy VAR_DECL created for non-static
    data member privatization, return the underlying "this" parameter,
    otherwise return NULL.  */
@@ -2121,7 +2140,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  else
 		    install_var_field (decl, true, 3, ctx,
 				       base_pointers_restrict);
-		  if (is_gimple_omp_offloaded (ctx->stmt))
+		  if (is_gimple_omp_offloaded (ctx->stmt)
+		      && !is_oacc_parallel_reduction (decl, ctx))
 		    install_var_local (decl, ctx);
 		}
 	    }
@@ -4821,7 +4841,7 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  gimplify_assign (ptr, x, ilist);
 		}
 	    }
-	  else if (is_reference (var) && !is_oacc_parallel (ctx))
+	  else if (is_reference (var))
 	    {
 	      /* For references that are being privatized for Fortran,
 		 allocate new backing storage for the new pointer
@@ -5557,7 +5577,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
-	tree incoming, outgoing;
+	tree incoming, outgoing, v1, v2, v3;
+	bool is_private = false;
 
 	enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
 	if (rcode == MINUS_EXPR)
@@ -5570,7 +5591,6 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
 	if (!var)
 	  var = orig;
-	gcc_assert (!is_reference (var));
 
 	incoming = outgoing = var;
 	
@@ -5606,22 +5626,38 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		for (; cls;  cls = OMP_CLAUSE_CHAIN (cls))
 		  if (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_REDUCTION
 		      && orig == OMP_CLAUSE_DECL (cls))
-		    goto has_outer_reduction;
+		    {
+		      incoming = outgoing = lookup_decl (orig, probe);
+		      goto has_outer_reduction;
+		    }
+		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
+			    || OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			   && orig == OMP_CLAUSE_DECL (cls))
+		    {
+		      is_private = true;
+		      goto do_lookup;
+		    }
 	      }
 
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
-		&& maybe_lookup_field (orig, outer))
+		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
 		ref_to_res = build_receiver_ref (orig, false, outer);
 		if (is_reference (orig))
 		  ref_to_res = build_simple_mem_ref (ref_to_res);
 
+		tree type = TREE_TYPE (var);
+		if (POINTER_TYPE_P (type))
+		  type = TREE_TYPE (type);
+
 		outgoing = var;
-		incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+		incoming = omp_reduction_init_op (loc, rcode, type);
 	      }
+	    else if (ctx->outer)
+	      incoming = outgoing = lookup_decl (orig, ctx->outer);
 	    else
 	      incoming = outgoing = orig;
 	      
@@ -5631,6 +5667,37 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	if (!ref_to_res)
 	  ref_to_res = integer_zero_node;
 
+        if (is_reference (orig))
+	  {
+	    tree type = TREE_TYPE (var);
+	    const char *id = IDENTIFIER_POINTER (DECL_NAME (var));
+
+	    if (!inner)
+	      {
+		tree x = create_tmp_var (TREE_TYPE (type), id);
+		gimplify_assign (var, build_fold_addr_expr (x), fork_seq);
+	      }
+
+	    v1 = create_tmp_var (type, id);
+	    v2 = create_tmp_var (type, id);
+	    v3 = create_tmp_var (type, id);
+
+	    gimplify_assign (v1, var, fork_seq);
+	    gimplify_assign (v2, var, fork_seq);
+	    gimplify_assign (v3, var, fork_seq);
+
+	    var = build_simple_mem_ref (var);
+	    v1 = build_simple_mem_ref (v1);
+	    v2 = build_simple_mem_ref (v2);
+	    v3 = build_simple_mem_ref (v3);
+	    outgoing = build_simple_mem_ref (outgoing);
+
+	    if (TREE_CODE (incoming) != INTEGER_CST)
+	      incoming = build_simple_mem_ref (incoming);
+	  }
+	else
+	  v1 = v2 = v3 = var;
+
 	/* Determine position in reduction buffer, which may be used
 	   by target.  */
 	enum machine_mode mode = TYPE_MODE (TREE_TYPE (var));
@@ -5660,20 +5727,20 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, init_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v1, level, op, off);
 	tree fini_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, fini_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v2, level, op, off);
 	tree teardown_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, teardown_code,
-					  ref_to_res, var, level, op, off);
+					  ref_to_res, v3, level, op, off);
 
-	gimplify_assign (var, setup_call, &before_fork);
-	gimplify_assign (var, init_call, &after_fork);
-	gimplify_assign (var, fini_call, &before_join);
+	gimplify_assign (v1, setup_call, &before_fork);
+	gimplify_assign (v2, init_call, &after_fork);
+	gimplify_assign (v3, fini_call, &before_join);
 	gimplify_assign (outgoing, teardown_call, &after_join);
       }
 
@@ -5915,9 +5982,6 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	}
     }
 
-  if (is_gimple_omp_oacc (ctx->stmt))
-    return;
-
   stmt = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ATOMIC_START),
 			    0);
   gimple_seq_add_stmt (stmt_seqp, stmt);
@@ -15804,7 +15868,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (!maybe_lookup_field (var, ctx))
 	  continue;
 
-	if (offloaded)
+	/* Don't remap oacc parallel reduction variables, because the
+	   intermediate result must be local to each gang.  */
+	if (offloaded && !is_oacc_parallel_reduction (var, ctx))
 	  {
 	    x = build_receiver_ref (var, true, ctx);
 	    tree new_var = lookup_decl (var, ctx);
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
index de97125..59cb6f4 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -1,70 +1,66 @@
-/* { dg-require-effective-target alloca } */
 /* Integer reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   int result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   result = 0;
-//   vresult = 0;
-// 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-//
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&:result)
+#pragma acc loop vector reduction (&:result)
   for (i = 0; i < n; i++)
     result &= array[i];
 
   /* '|' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (|:result)
+#pragma acc loop vector reduction (|:result)
   for (i = 0; i < n; i++)
     result |= array[i];
 
   /* '^' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (^:result)
+#pragma acc loop vector reduction (^:result)
   for (i = 0; i < n; i++)
     result ^= array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-2.c b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 2964236..4889241 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -1,49 +1,48 @@
-/* { dg-require-effective-target alloca } */
 /* float reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   float result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-3.c b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
index 34c51c2..b19224e2 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -1,49 +1,48 @@
-/* { dg-require-effective-target alloca } */
 /* double reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-4.c b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
index 328c0d4..88d7f70 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -1,51 +1,54 @@
-/* { dg-require-effective-target alloca } */
 /* complex reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   __complex__ double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
-  /* Needs support for complex multiplication.  */
-
-//   /* '*' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (*:result)
-//   for (i = 0; i < n; i++)
-//     result *= array[i];
-//
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+
+  /* 'max' reductions.  */
+#if 0
+  // error: 'result' has invalid type for 'reduction(max)'
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+#endif
+
+  /* 'min' reductions.  */
+#if 0
+  // error: 'result' has invalid type for 'reduction(min)'
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
+#endif
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (__real__(result) > __real__(array[i]));
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (__real__(result) > __real__(array[i]));
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
new file mode 100644
index 0000000..8341053
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
@@ -0,0 +1,202 @@
+int i;
+
+int main(void)
+{
+  int j, v;
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyin (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyout (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copy (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) create (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copy (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_create (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+
+#pragma acc data copyin (i, j)
+  {
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present (i, j)
+    {
+      if (i != -1 || j != -2)
+	abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+	abort ();
+      v = 1;
+    }
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+
+#pragma acc data copyin(i, j)
+  {
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v)
+    {
+      if (i != -1 || j != -2)
+	abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+	abort ();
+      v = 1;
+    }
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c
new file mode 100644
index 0000000..640d827
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c
@@ -0,0 +1,13 @@
+
+/* { dg-additional-options "-O2 -fopenacc-dim=16:16" } */
+
+#include "loop-default.h"
+#include <stdlib.h>
+
+int main ()
+{
+  /* Environment should be ignored.  */
+  setenv ("GOMP_OPENACC_DIM", "8:8",  1);
+  
+  return test_1 (16, 16, 32);
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h
new file mode 100644
index 0000000..55de04b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h
@@ -0,0 +1,125 @@
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <stdio.h>
+
+#pragma acc routine
+static int __attribute__ ((noinline)) coord ()
+{
+  int res = 0;
+
+  if (acc_on_device (acc_device_nvidia))
+    {
+      int g = 0, w = 0, v = 0;
+
+      __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+      __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+      __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+      res = (1 << 24) | (g << 16) | (w << 8) | v;
+    }
+  return res;
+}
+
+
+int check (const int *ary, int size, int gp, int wp, int vp)
+{
+  int exit = 0;
+  int ix;
+  int *gangs = (int *)alloca (gp * sizeof (int));
+  int *workers = (int *)alloca (wp * sizeof (int));
+  int *vectors = (int *)alloca (vp * sizeof (int));
+  int offloaded = 0;
+  
+  memset (gangs, 0, gp * sizeof (int));
+  memset (workers, 0, wp * sizeof (int));
+  memset (vectors, 0, vp * sizeof (int));
+  
+  for (ix = 0; ix < size; ix++)
+    {
+      int g = (ary[ix] >> 16) & 0xff;
+      int w = (ary[ix] >> 8) & 0xff;
+      int v = (ary[ix] >> 0) & 0xff;
+
+      if (g >= gp || w >= wp || v >= vp)
+	{
+	  printf ("unexpected cpu %#x used\n", ary[ix]);
+	  exit = 1;
+	}
+      else
+	{
+	  vectors[v]++;
+	  workers[w]++;
+	  gangs[g]++;
+	}
+      offloaded += ary[ix] >> 24;
+    }
+
+  if (!offloaded)
+    return 0;
+
+  if (offloaded != size)
+    {
+      printf ("offloaded %d times,  expected %d\n", offloaded, size);
+      return 1;
+    }
+
+  for (ix = 0; ix < gp; ix++)
+    if (gangs[ix] != gangs[0])
+      {
+	printf ("gang %d not used %d times\n", ix, gangs[0]);
+	exit = 1;
+      }
+  
+  for (ix = 0; ix < wp; ix++)
+    if (workers[ix] != workers[0])
+      {
+	printf ("worker %d not used %d times\n", ix, workers[0]);
+	exit = 1;
+      }
+  
+  for (ix = 0; ix < vp; ix++)
+    if (vectors[ix] != vectors[0])
+      {
+	printf ("vector %d not used %d times\n", ix, vectors[0]);
+	exit = 1;
+      }
+  
+  return exit;
+}
+
+#define N (32 *32*32)
+
+int test_1 (int gp, int wp, int vp)
+{
+  int ary[N];
+  int exit = 0;
+  
+#pragma acc parallel copyout (ary)
+  {
+#pragma acc loop gang (static:1)
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, gp, 1, 1);
+
+#pragma  acc parallel copyout (ary)
+  {
+#pragma acc loop worker
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, 1, wp, 1);
+
+#pragma  acc parallel copyout (ary)
+  {
+#pragma acc loop vector
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, 1, 1, vp);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
index 23c2a75..7afb89b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
index 1a93db3..db83692 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
index c14bddd..129a8c8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
index 706d0d8..fadf7d5 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
index a073ac8..68d3d7a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
new file mode 100644
index 0000000..55ab3c9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
@@ -0,0 +1,45 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, non-private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  res = hres = 1;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(*:res)
+    for (i = 0; i < 12; i++)
+      res *= arr[i];
+  }
+
+  for (i = 0; i < 12; i++)
+    hres *= arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
new file mode 100644
index 0000000..d4341e9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
new file mode 100644
index 0000000..2e5668b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs and workers, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
new file mode 100644
index 0000000..d610373
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
@@ -0,0 +1,28 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
new file mode 100644
index 0000000..ea5c151
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
@@ -0,0 +1,34 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+    /* "res" is non-private, and is not available until after the parallel
+       region.  */
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
new file mode 100644
index 0000000..0056f3c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
@@ -0,0 +1,33 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  double arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copyin(arr) copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
new file mode 100644
index 0000000..e69d0ec
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
@@ -0,0 +1,55 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, multiple
+   non-private reduction variables, float type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, mres = 0, hres = 0, hmres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res, mres)
+  {
+    #pragma acc loop gang reduction(+:res) reduction(max:mres)
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + i];
+	    if (arr[j * 1024 + i] > mres)
+	      mres = arr[j * 1024 + i];
+	  }
+
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + (1023 - i)];
+	    if (arr[j * 1024 + (1023 - i)] > mres)
+	      mres = arr[j * 1024 + (1023 - i)];
+	  }
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+	if (arr[j * 1024 + i] > hmres)
+	  hmres = arr[j * 1024 + i];
+	if (arr[j * 1024 + (1023 - i)] > hmres)
+	  hmres = arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+  assert (mres == hmres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
new file mode 100644
index 0000000..dd181ef
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
new file mode 100644
index 0000000..15f0053
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
@@ -0,0 +1,41 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (vector reduction in
+   gang-partitioned/worker-partitioned mode, private reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, k;
+  double ina[1024], inb[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      {
+        ina[j * 32 + i] = (i == j) ? 2.0 : 0.0;
+	inb[j * 32 + i] = (double) (i + j);
+      }
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(acc) copyin(ina, inb) copyout(out)
+  {
+    #pragma acc loop gang worker
+    for (k = 0; k < 32; k++)
+      for (j = 0; j < 32; j++)
+        {
+	  acc = 0;
+
+	  #pragma acc loop vector reduction(+:acc)
+	  for (i = 0; i < 32; i++)
+	    acc += ina[k * 32 + i] * inb[i * 32 + j];
+
+	  out[k * 32 + j] = acc;
+	}
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      assert (out[j * 32 + i] == (i + j) * 2);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
new file mode 100644
index 0000000..4864acd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
new file mode 100644
index 0000000..2765908
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
@@ -0,0 +1,41 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
new file mode 100644
index 0000000..c30b0e7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
@@ -0,0 +1,45 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = j;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 1023; i >= 0; i--)
+	  res += arr[j * 1024 + i];
+
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = j;
+      
+      for (i = 0; i < 1024; i++)
+	hres += arr[j * 1024 + i] * 2;
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
new file mode 100644
index 0000000..b5e28fb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable: gang-redundant mode).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i ^ 33;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyin(arr) copyout(out)
+  {
+    /* Private variables aren't initialized by default in openacc.  */
+    res = 0;
+
+    /* "res" should be available at the end of the following loop (and should
+       have the same value redundantly in each gang).  */
+    #pragma acc loop worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+    
+    #pragma acc loop gang (static: 1)
+    for (i = 0; i < 32; i++)
+      out[i] = res;
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  for (i = 0; i < 32; i++)
+    assert (out[i] == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
index 539e41d..28c6d0b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
new file mode 100644
index 0000000..5e82e1d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (worker and
+   vector-partitioned loops individually in gang-partitioned mode, int
+   type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) copy(res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 1023; i >= 0; i--)
+	  res += arr[j * 1024 + i];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      hres += arr[j * 1024 + i] * 2;
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
new file mode 100644
index 0000000..a7a75a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
@@ -0,0 +1,40 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   in gang-partitioned mode, int type with XOR).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       reduction(^:res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(^:res)
+	for (i = 0; i < 1024; i++)
+	  res ^= arr[j * 1024 + i];
+
+	#pragma acc loop worker vector reduction(^:res)
+	for (i = 0; i < 1024; i++)
+	  res ^= arr[j * 1024 + (1023 - i)];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres ^= arr[j * 1024 + i];
+	hres ^= arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
new file mode 100644
index 0000000..8d85fed
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
@@ -0,0 +1,42 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   together in gang-partitioned mode, float type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) copy(res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + (1023 - i)];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
new file mode 100644
index 0000000..1904b4a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
@@ -0,0 +1,55 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   together in gang-partitioned mode, float type, multiple reductions).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, mres = 0, hres = 0, hmres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) reduction(max:mres) copy(res, mres)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + i];
+	    if (arr[j * 1024 + i] > mres)
+	      mres = arr[j * 1024 + i];
+	  }
+
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + (1023 - i)];
+	    if (arr[j * 1024 + (1023 - i)] > mres)
+	      mres = arr[j * 1024 + (1023 - i)];
+	  }
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+	if (arr[j * 1024 + i] > hmres)
+	  hmres = arr[j * 1024 + i];
+	if (arr[j * 1024 + (1023 - i)] > hmres)
+	  hmres = arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+  assert (mres == hmres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
index dceac39..a88b60f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
@@ -1,40 +1,54 @@
+/* { dg-additional-options "-w" } */
+
 #include <assert.h>
 
+/* Test of reduction on parallel directive.  */
+
+#define ACTUAL_GANGS 256
+
 int
 main (int argc, char *argv[])
 {
-  int res, res2 = 0;
+  int res, res1 = 0, res2 = 0;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 256
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(+:res1) copy(res2, res1)
   {
+    res1 += 5;
+
     #pragma acc atomic
     res2 += 5;
   }
   res = GANGS * 5;
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
-  res = res2 = 1;
+  res = res1 = res2 = 1;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 8
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(*:res1) copy(res1, res2)
   {
+    res1 *= 5;
+
     #pragma acc atomic
     res2 *= 5;
   }
   for (int i = 0; i < GANGS; ++i)
     res *= 5;
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
index bd5715c..911b76c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
@@ -1,18 +1,25 @@
+/* { dg-additional-options "-w" } */
+
 #include <assert.h>
 #include <openacc.h>
 
+/* Test of reduction on parallel directive (with async).  */
+
 int
 main (int argc, char *argv[])
 {
-  int res, res2 = 0;
+  int res, res1 = 0, res2 = 0;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 256
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2) async(1)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(+:res1) copy(res1, res2) async(1)
   {
+    res1 += 5;
+
     #pragma acc atomic
     res2 += 5;
   }
@@ -20,18 +27,22 @@ main (int argc, char *argv[])
 
   acc_wait (1);
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
-  res = res2 = 1;
+  res = res1 = res2 = 1;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 8
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2) async(1)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(*:res1) copy(res1, res2) async(1)
   {
+    res1 *= 5;
+
     #pragma acc atomic
     res2 *= 5;
   }
@@ -40,6 +51,7 @@ main (int argc, char *argv[])
 
   acc_wait (1);
 
+  assert (res == res1);
   assert (res == res2);
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
index fd9df33..9a411fe 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
@@ -2,205 +2,5 @@
 
 #include <stdlib.h>
 
-int i;
-
-int main(void)
-{
-  int j, v;
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-
-#pragma acc data copyin (i, j)
-  {
-#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j)
-    {
-      if (i != -1 || j != -2)
-        abort ();
-      i = 2;
-      j = 1;
-      if (i != 2 || j != 1)
-        abort ();
-      v = 1;
-    }
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-
-#pragma acc data copyin(i, j)
-  {
-#pragma acc parallel /* copyout */ present_or_copyout (v)
-    {
-      if (i != -1 || j != -2)
-        abort ();
-      i = 2;
-      j = 1;
-      if (i != 2 || j != 1)
-        abort ();
-      v = 1;
-    }
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  return 0;
-}
+#define EXEC_DIRECTIVE parallel
+#include "data-clauses.h"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index 39357ce..ecf78c7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -5,12 +5,20 @@
 
 int main ()
 {
+  int dummy[10];
+  
 #pragma acc parallel num_workers (2<<20) /* { dg-error "using num_workers" } */
   {
+#pragma acc loop worker
+    for (int  i = 0; i < 10; i++)
+      dummy[i] = i;
   }
 
 #pragma acc parallel vector_length (2<<20) /* { dg-error "using vector_length" } */
   {
+#pragma acc loop vector
+    for (int  i = 0; i < 10; i++)
+      dummy[i] = i;
   }
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
new file mode 100644
index 0000000..b2c60e5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+#define N 10
+
+int
+main ()
+{
+  int s1 = 0, s2 = 0;
+  int i;
+  int dummy = 0;
+
+#pragma acc data copy (dummy)
+  {
+#pragma acc parallel num_gangs (N) reduction (+:s1) copy(s1)
+    {
+      s1++;
+    }
+  }
+
+  if (acc_get_device_type () != acc_device_nvidia)
+    {
+      if (s1 != 1)
+	abort ();
+    }
+  else
+    {
+      if (s1 != N)
+	abort ();
+    }
+
+  s1 = 0;
+  s2 = 0;
+
+#pragma acc parallel num_gangs (10) reduction (+:s1, s2) copy(s1, s2)
+  {
+    s1++;
+    s2 += N;
+  }
+
+  if (acc_get_device_type () != acc_device_nvidia)
+    {
+      if (s1 != 1)
+	abort ();
+      if (s2 != N)
+	abort ();
+    }
+  else
+    {
+      if (s1 != N)
+	abort ();
+      if (s2 != N*N)
+	abort ();
+    }
+
+  s1 = 0;
+
+#pragma acc parallel num_gangs (10) reduction (+:s1) copy(s1)
+  {
+#pragma acc loop gang reduction (+:s1)
+    for (i = 0; i < 10; i++)
+      s1++;
+  }
+
+  if (s1 != N)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
index e557931..10eb278 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
@@ -1,46 +1,59 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* Integer reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-
-#define vl 32
-
-#define DO_PRAGMA(x) _Pragma (#x)
-
-#define check_reduction_op(type, op, init, b)	\
-  {						\
-    type res, vres;				\
-    res = (init);				\
-    DO_PRAGMA (acc parallel vector_length (vl) copy(res))	\
-DO_PRAGMA (acc loop reduction (op:res))\
-    for (i = 0; i < n; i++)			\
-      res = res op (b);				\
-						\
-    vres = (init);				\
-    for (i = 0; i < n; i++)			\
-      vres = vres op (b);			\
-						\
-    if (res != vres)				\
-      abort ();					\
-  }
+#include "reduction.h"
+
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
 static void
-test_reductions_int (void)
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 100;
   int i;
   int array[n];
 
   for (i = 0; i < n; i++)
-    array[i] = i;
-
-  check_reduction_op (int, +, 0, array[i]);
-  check_reduction_op (int, *, 1, array[i]);
-  check_reduction_op (int, &, -1, array[i]);
-  check_reduction_op (int, |, 0, array[i]);
-  check_reduction_op (int, ^, 0, array[i]);
+    array[i] = i+1;
+
+  /* Gang reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, *, 1, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, &, -1, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, |, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, ^, 0, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (int, *, 1, array[i], num_workers (nw), worker);
+  check_reduction_op (int, &, -1, array[i], num_workers (nw), worker);
+  check_reduction_op (int, |, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (int, ^, 0, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (int, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (int, *, 1, array[i], vector_length (vl), vector);
+  check_reduction_op (int, &, -1, array[i], vector_length (vl), vector);
+  check_reduction_op (int, |, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (int, ^, 0, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, *, 1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, &, -1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, |, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, ^, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
 }
 
 static void
@@ -55,29 +68,31 @@ test_reductions_bool (void)
     array[i] = i;
 
   cmp_val = 5;
-  check_reduction_op (bool, &&, true, (cmp_val > array[i]));
-  check_reduction_op (bool, ||, false, (cmp_val > array[i]));
-}
 
-#define check_reduction_macro(type, op, init, b)	\
-  {							\
-    type res, vres;					\
-    res = (init);					\
-DO_PRAGMA (acc parallel vector_length (vl) copy(res))\
-DO_PRAGMA (acc loop reduction (op:res))\
-    for (i = 0; i < n; i++)				\
-      res = op (res, (b));				\
-							\
-    vres = (init);					\
-    for (i = 0; i < n; i++)				\
-      vres = op (vres, (b));				\
-							\
-    if (res != vres)					\
-      abort ();						\
-  }
-
-#define max(a, b) (((a) > (b)) ? (a) : (b))
-#define min(a, b) (((a) < (b)) ? (a) : (b))
+  /* Gang reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_gangs (ng),
+		      gang);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_gangs (ng),
+		      gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_workers (nw),
+		      worker);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_workers (nw),
+		      worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), vector_length (vl),
+		      vector);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), vector_length (vl),
+		      vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_gangs (ng)
+		      num_workers (nw) vector_length (vl), gang worker vector);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_gangs (ng)
+		      num_workers (nw) vector_length (vl), gang worker vector);
+}
 
 static void
 test_reductions_minmax (void)
@@ -89,14 +104,32 @@ test_reductions_minmax (void)
   for (i = 0; i < n; i++)
     array[i] = i;
 
-  check_reduction_macro (int, min, n + 1, array[i]);
-  check_reduction_macro (int, max, -1, array[i]);
+  /* Gang reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (int, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_workers (nw), worker);
+  check_reduction_macro (int, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (int, max, -1, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (int, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
 }
 
 int
 main (void)
 {
-  test_reductions_int ();
+  test_reductions ();
   test_reductions_bool ();
   test_reductions_minmax ();
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
index 8a0b0d6..7cb9497 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
@@ -1,124 +1,83 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* float reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 100;
   int i;
-  float vresult, result, array[n];
-  bool lvresult, lresult;
+  float array[n];
 
   for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
+    array[i] = i+1;
 
-  /* '+' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (+:result)
-  for (i = 0; i < n; i++)
-    result += array[i];
+  /* Gang reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (float, *, 1, array[i], num_gangs (ng), gang);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult += array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* '*' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (*:result)
-  for (i = 0; i < n; i++)
-    result *= array[i];
+  /* Worker reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (float, *, 1, array[i], num_workers (nw), worker);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult *= array[i];
-
-  if (fabs(result - vresult) > .0001)
-    abort ();
-  result = 0;
-  vresult = 0;
-
-  /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (max:result)
-  for (i = 0; i < n; i++)
-    result = result > array[i] ? result : array[i];
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult > array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (min:result)
-  for (i = 0; i < n; i++)
-    result = result < array[i] ? result : array[i];
+  /* Vector reductions.  */
+  check_reduction_op (float, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (float, *, 1, array[i], vector_length (vl), vector);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult < array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
-
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (result > array[i]);
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (result > array[i]);
-
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
+  /* Combined reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (float, *, 1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+}
 
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (result > array[i]);
+static void
+test_reductions_minmax (void)
+{
+  const int n = 1000;
+  int i;
+  float array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (result > array[i]);
+    array[i] = i;
 
-  if (lresult != lvresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (float, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_workers (nw),
+			 worker);
+  check_reduction_macro (float, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (float, max, -1, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (float, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw)vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
+  test_reductions_minmax ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
index a233e29..1b948be 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
@@ -1,93 +1,84 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* double reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 10;
   int i;
-  double vresult, result, array[n];
-  bool lvresult, lresult;
-
-  for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
-
-  /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (max:result)
-  for (i = 0; i < n; i++)
-    result = result > array[i] ? result : array[i];
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult > array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (min:result)
-  for (i = 0; i < n; i++)
-    result = result < array[i] ? result : array[i];
+  double array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    vresult = vresult < array[i] ? vresult : array[i];
+    array[i] = i+1;
 
-  if (result != vresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (double, *, 1, array[i], num_gangs (ng), gang);
 
-  result = 5;
-  vresult = 5;
+  /* Worker reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (double, *, 1, array[i], num_workers (nw), worker);
 
-  lresult = false;
-  lvresult = false;
+  /* Vector reductions.  */
+  check_reduction_op (double, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (double, *, 1, array[i], vector_length (vl), vector);
 
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (result > array[i]);
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (result > array[i]);
-
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
+  /* Combined reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_gangs (ng)  num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (double, *, 1, array[i], num_gangs (ng)  num_workers (nw)
+		      vector_length (vl), gang worker vector);
+}
 
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (result > array[i]);
+static void
+test_reductions_minmax (void)
+{
+  const int n = 1000;
+  int i;
+  double array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (result > array[i]);
+    array[i] = i;
 
-  if (lresult != lvresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (double, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_workers (nw),
+			 worker);
+  check_reduction_macro (double, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (double, max, -1, array[i], vector_length (vl),
+			 vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (double, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
+  test_reductions_minmax ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
index 59d49c1..79355ed 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
@@ -1,59 +1,56 @@
 /* { dg-do run { target { ! { hppa*-*-hpux* } } } } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* complex reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
 #include <complex.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 10;
   int i;
-  double _Complex vresult, result, array[n];
-  bool lvresult, lresult;
-
-  for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
-
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (creal(result) > creal(array[i]));
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (creal(result) > creal(array[i]));
+  double _Complex array[n];
 
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
-
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (creal(result) > creal(array[i]));
-
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (creal(result) > creal(array[i]));
-
-  if (lresult != lvresult)
-    abort ();
+    array[i] = i+1;
+
+  /* Gang reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_gangs (ng), gang);
+  check_reduction_op (double, *, 1, creal (array[i]), num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_workers (nw),
+		      worker);
+  check_reduction_op (double, *, 1, creal (array[i]), num_workers (nw),
+		      worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), vector_length (vl),
+		      vector);
+  check_reduction_op (double, *, 1, creal (array[i]), vector_length (vl),
+		      vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_op (double, *, 1, creal (array[i]), num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
index efe8702..46b553a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
@@ -1,32 +1,57 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
+/* Multiple reductions.  */
+
 #include <stdio.h>
 #include <stdlib.h>
 
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
+
+const int n = 100;
+
+#define DO_PRAGMA(x) _Pragma (#x)
+
+#define check_reduction(gwv_par, gwv_loop)		\
+  {							\
+  s1 = 2; s2 = 5;					\
+DO_PRAGMA (acc parallel gwv_par copy (s1, s2))		\
+DO_PRAGMA (acc loop gwv_loop reduction (+:s1, s2))	\
+    for (i = 0; i < n; i++)				\
+      {							\
+         s1 = s1 + 3;					\
+         s2 = s2 + 5;					\
+      }							\
+							\
+    if (s1 != v1 && s2 != v2)				\
+      abort ();						\
+  }
+
 int
 main (void)
 {
   int s1 = 2, s2 = 5, v1 = 2, v2 = 5;
-  int n = 100;
   int i;
 
-#pragma acc parallel vector_length (32) copy(s1,s2)
-#pragma acc loop reduction (+:s1, s2)
-  for (i = 0; i < n; i++)
-    {
-      s1 = s1 + 3;
-      s2 = s2 + 2;
-    }
-
   for (i = 0; i < n; i++)
     {
       v1 = v1 + 3;
       v2 = v2 + 2;
     }
-  
-  if (s1 != v1)
-    abort ();
-  
-  if (s2 != v2)
-    abort ();
-    
+
+  check_reduction (num_gangs (ng), gang);
+
+  /* Nvptx targets require a vector_length or 32 in to allow spinlocks with
+     gangs.  */
+  check_reduction (num_workers (nw) vector_length (vl), worker);
+  check_reduction (vector_length (vl), vector);
+  check_reduction (num_gangs (ng) num_workers (nw) vector_length (vl), gang
+		   worker vector);
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
new file mode 100644
index 0000000..af30b31
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+/* Test reductions on explicitly private variables.  */
+
+#include <assert.h>
+
+int
+main ()
+{
+  int i, j, red[10];
+  int v;
+
+  for (i = 0; i < 10; i++)
+    red[i] = -1;
+
+#pragma acc parallel copyout(red)
+  {
+#pragma acc loop gang private(v)
+    for (j = 0; j < 10; j++)
+      {
+	v = j;
+
+#pragma acc loop vector reduction (+:v)
+	for (i = 0; i < 100; i++)
+	  v++;
+
+	red[j] = v;
+      }
+  }
+
+  for (i = 0; i < 10; i++)
+    assert (red[i] == i + 100);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
new file mode 100644
index 0000000..1b3f8d4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
@@ -0,0 +1,43 @@
+#ifndef REDUCTION_H
+#define REDUCTION_H
+
+#define DO_PRAGMA(x) _Pragma (#x)
+
+#define check_reduction_op(type, op, init, b, gwv_par, gwv_loop)	\
+  {									\
+    type res, vres;							\
+    res = (init);							\
+DO_PRAGMA (acc parallel gwv_par copy (res))				\
+DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
+    for (i = 0; i < n; i++)						\
+      res = res op (b);							\
+									\
+    vres = (init);							\
+    for (i = 0; i < n; i++)						\
+      vres = vres op (b);						\
+									\
+    if (res != vres)							\
+      abort ();								\
+  }
+
+#define check_reduction_macro(type, op, init, b, gwv_par, gwv_loop)	\
+  {									\
+    type res, vres;							\
+    res = (init);							\
+    DO_PRAGMA (acc parallel gwv_par copy(res))				\
+DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
+    for (i = 0; i < n; i++)						\
+      res = op (res, (b));						\
+									\
+    vres = (init);							\
+    for (i = 0; i < n; i++)						\
+      vres = op (vres, (b));						\
+									\
+    if (res != vres)							\
+      abort ();								\
+  }
+
+#define max(a, b) (((a) > (b)) ? (a) : (b))
+#define min(a, b) (((a) < (b)) ? (a) : (b))
+
+#endif
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90
new file mode 100644
index 0000000..4c86ada
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90
@@ -0,0 +1,75 @@
+! Exercise the auto, independent, seq and tile loop clauses inside
+! parallel regions. 
+
+program loops
+  integer, parameter     :: n = 20, c = 10
+  integer                :: i, a(n), b(n)
+
+  a(:) = 0
+  b(:) = 0
+
+  ! COPY
+
+  !$acc parallel copy (a)
+  !$acc loop auto
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     b(i) = i
+  end do
+
+  call check (a, b, n)
+
+  ! COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel copyout (a)
+  !$acc loop independent
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+  call check (a, b, n)
+
+  ! COPYIN
+
+  a(:) = 0
+
+  !$acc parallel copyout (a) copyin (b)
+  !$acc loop seq
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPY
+
+  !$acc parallel pcopy (a)
+  !$acc loop tile (*)
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+end program loops
+
+subroutine check (a, b, n)
+  integer :: n, a(n), b(n)
+  integer :: i
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+end subroutine check
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
new file mode 100644
index 0000000..f49ed73
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+! { dg-additional-options "-w" }
+
+program reduction
+  integer, parameter :: n = 10
+  integer sum
+  include "openacc_lib.h"
+
+  sum = 0
+
+  !$acc parallel reduction(+:sum) num_gangs (n) copy(sum)
+  sum = sum + 1
+  !$acc end parallel
+
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     if (sum .ne. n) call abort
+  else
+     if (sum .ne. 1) call abort
+  end if
+
+  ! Test reductions inside subroutines
+
+  sum = 0
+  call redsub (sum, n)
+
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     if (sum .ne. n) call abort
+  else
+     if (sum .ne. 1) call abort
+  end if
+end program reduction
+
+subroutine redsub(sum, n)
+  integer :: sum, n
+
+  !$acc parallel reduction(+:sum) num_gangs (10)  copy(sum)
+  sum = sum + 1
+  !$acc end parallel
+end subroutine redsub
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
index db0a52d..e51509f 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
@@ -1,28 +1,55 @@
 ! { dg-do run }
+! { dg-additional-options "-w" }
 
 ! Integer reductions
 
 program reduction_1
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
-  integer               :: i, vresult, result
-  logical               :: lresult, lvresult
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
+  integer               :: i, vresult, rg, rw, rv, rc
+  logical               :: lrg, lrw, lrv, lrc, lvresult
   integer, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -31,17 +58,46 @@ program reduction_1
      vresult = vresult + array(i)
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 0
-  vresult = 0
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! '*' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
+  vresult = 1
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
   do i = 1, n
-     result = result * array(i)
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -50,17 +106,46 @@ program reduction_1
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
+  do i = 1, n
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel num_gangs(ng) Num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
   do i = 1, n
-     result = max (result, array(i))
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -69,17 +154,46 @@ program reduction_1
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! 'min' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
   do i = 1, n
-     result = min (result, array(i))
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
+  do i = 1, n
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -88,17 +202,46 @@ program reduction_1
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! 'iand' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! 'iand' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(iand:rg) gang
+  do i = 1, n
+     rg = iand (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(iand:rw) worker
+  do i = 1, n
+     rw = iand (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(iand:rv) vector
+  do i = 1, n
+     rv = iand (rv, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(iand:result)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(iand:rc) gang worker vector
   do i = 1, n
-     result = iand (result, array(i))
+     rc = iand (rc, array(i))
   end do
   !$acc end parallel
 
@@ -107,17 +250,46 @@ program reduction_1
      vresult = iand (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! 'ior' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(ior:rg) gang
+  do i = 1, n
+     rg = ior (rg, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(ior:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(ior:rw) worker
   do i = 1, n
-     result = ior (result, array(i))
+     rw = ior (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(ior:rv) gang
+  do i = 1, n
+     rv = ior (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(ior:rc) gang worker vector
+  do i = 1, n
+     rc = ior (rc, array(i))
   end do
   !$acc end parallel
 
@@ -126,17 +298,46 @@ program reduction_1
      vresult = ior (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
-  result = 0
+  !
+  ! 'ieor' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'ieor' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(ieor:rg) gang
+  do i = 1, n
+     rg = ieor (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(ieor:rw) worker
+  do i = 1, n
+     rw = ieor (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(ieor:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(ieor:rv) vector
   do i = 1, n
-     result = ieor (result, array(i))
+     rv = ieor (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(ieor:rc) gang worker vector
+  do i = 1, n
+     rc = ieor (rc, array(i))
   end do
   !$acc end parallel
 
@@ -145,17 +346,46 @@ program reduction_1
      vresult = ieor (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
+  do i = 1, n
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -164,17 +394,46 @@ program reduction_1
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
+
+  !
+  ! '.or.' reductions
+  !
 
-  lresult = .false.
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
+  do i = 1, n
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -183,17 +442,46 @@ program reduction_1
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -202,17 +490,46 @@ program reduction_1
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -221,5 +538,8 @@ program reduction_1
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_1
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
index 3d99668..cd09099 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
@@ -5,26 +5,52 @@
 program reduction_2
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  real, parameter       :: e = .001
-  real                  :: vresult, result
-  logical               :: lresult, lvresult
-  real, dimension (n) :: array
+  real                  :: vresult, rg, rw, rv, rc
+  real, parameter       :: e = 0.001
+  logical               :: lrg, lrw, lrv, lrc, lvresult
+  real, dimension (n)   :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -33,17 +59,46 @@ program reduction_2
      vresult = vresult + array(i)
   end do
 
-  if (abs (result - vresult) .ge. e) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! '*' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! '*' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
   do i = 1, n
-     result = result * array(i)
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -52,17 +107,46 @@ program reduction_2
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rg = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
   do i = 1, n
-     result = max (result, array(i))
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -71,17 +155,46 @@ program reduction_2
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! 'min' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
   do i = 1, n
-     result = min (result, array(i))
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -90,17 +203,46 @@ program reduction_2
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -109,17 +251,46 @@ program reduction_2
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
-  lresult = .false.
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .false.
+  lrw = .false.
+  lrv = .false.
+  lrc = .false.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -128,17 +299,46 @@ program reduction_2
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -147,17 +347,46 @@ program reduction_2
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -166,5 +395,8 @@ program reduction_2
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_2
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
index d0b590e..a7dbf2b 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
@@ -5,26 +5,52 @@
 program reduction_3
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  double precision, parameter :: e = .001
-  double precision      :: vresult, result
-  logical               :: lresult, lvresult
+  double precision      :: vresult, rg, rw, rv, rc
+  double precision, parameter :: e = 0.001
+  logical               :: lrg, lrw, lrv, lrc, lvresult
   double precision, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -33,17 +59,46 @@ program reduction_3
      vresult = vresult + array(i)
   end do
 
-  if (abs (result - vresult) .ge. e) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! '*' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! '*' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
   do i = 1, n
-     result = result * array(i)
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -52,17 +107,46 @@ program reduction_3
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
   do i = 1, n
-     result = max (result, array(i))
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -71,17 +155,46 @@ program reduction_3
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! 'min' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
   do i = 1, n
-     result = min (result, array(i))
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -90,17 +203,46 @@ program reduction_3
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -109,17 +251,46 @@ program reduction_3
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
-  lresult = .false.
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .false.
+  lrw = .false.
+  lrv = .false.
+  lrc = .false.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -128,17 +299,46 @@ program reduction_3
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -147,17 +347,46 @@ program reduction_3
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -166,5 +395,8 @@ program reduction_3
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_3
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
index 8c99fdb..c3bdaf6 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
@@ -5,50 +5,108 @@
 program reduction_4
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  complex               :: vresult, result
+  real                  :: vresult, rg, rw, rv, rc
   complex, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
+  do i = 1, n
+     rw = rw + REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + REAL(array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
   do i = 1, n
-     result = result + array(i)
+     rc = rc + REAL(array(i))
   end do
   !$acc end parallel
 
   ! Verify the results
   do i = 1, n
-     vresult = vresult + array(i)
+     vresult = vresult + REAL(array(i))
   end do
 
-  if (result .ne. vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
-  result = 1
+  !
+  ! '*' reductions
+  !
+
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-!  ! '*' reductions
-!
-!  !$acc parallel vector_length(vl)
-!  !$acc loop reduction(*:result)
-!  do i = 1, n
-!     result = result * array(i)
-!  end do
-!  !$acc end parallel
-!
-!  ! Verify the results
-!  do i = 1, n
-!     vresult = vresult * array(i)
-!  end do
-!
-!  if (result.ne.vresult) call abort
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+     rw = rw * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+     vresult = vresult * REAL(array(i))
+  end do
+
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 end program reduction_4
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
index ec13e4e..4210648 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
@@ -1,12 +1,17 @@
 ! { dg-do run }
+! { dg-additional-options "-w" }
 
 ! subroutine reduction
 
 program reduction
   integer, parameter    :: n = 40, c = 10
-  integer               :: i, vsum, sum
+  integer               :: i, vsum, gs, ws, vs, cs, ns
 
-  call redsub (sum, n, c)
+  call redsub_gang (gs, n, c)
+  call redsub_worker (ws, n, c)
+  call redsub_vector (vs, n, c)
+  call redsub_combined (cs, n, c)
+  call redsub_nested (ns, n, c)
 
   vsum = 0
 
@@ -15,21 +20,80 @@ program reduction
      vsum = vsum + c
   end do
 
-  if (sum.ne.vsum) call abort ()
+  if (gs .ne. vsum) call abort ()
+  if (ws .ne. vsum) call abort ()
+  if (vs .ne. vsum) call abort ()
+  if (cs .ne. vsum) call abort ()
+  if (ns .ne. vsum) call abort ()
 end program reduction
 
-subroutine redsub(sum, n, c)
+subroutine redsub_gang(sum, n, c)
   integer :: sum, n, c
 
-  integer :: s
-  s = 0
+  sum = 0
 
-  !$acc parallel vector_length(32) copyin (n, c) copy (s) num_gangs(1)
-  !$acc loop reduction(+:s)
+  !$acc parallel copyin (n, c) num_gangs(n) copy(sum)
+  !$acc loop reduction(+:sum) gang
   do i = 1, n
-     s = s + c
+     sum = sum + c
   end do
   !$acc end parallel
+end subroutine redsub_gang
 
-  sum = s
-end subroutine redsub
+subroutine redsub_worker(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel copyin (n, c) num_workers(4) vector_length (32) copy(sum)
+  !$acc loop reduction(+:sum) worker
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_worker
+
+subroutine redsub_vector(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel copyin (n, c) vector_length(32) copy(sum)
+  !$acc loop reduction(+:sum) vector
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_vector
+
+subroutine redsub_combined(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel num_gangs (8) num_workers (4) vector_length(32) copy(sum)
+  !$acc loop reduction(+:sum) gang worker vector
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_combined
+
+subroutine redsub_nested(sum, n, c)
+  integer :: sum, n, c
+  integer :: ii, jj
+
+  ii = n / 10;
+  jj = 10;
+  sum = 0
+
+  !$acc parallel num_gangs (8) copy(sum)
+  !$acc loop reduction(+:sum) gang
+  do i = 1, ii
+     !$acc loop reduction(+:sum) vector
+     do j = 1, jj
+        sum = sum + c
+     end do
+  end do
+  !$acc end parallel
+end subroutine redsub_nested
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
index 2ff6f5f..f3ed275 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
@@ -1,30 +1,94 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp -w" }
 
 program reduction
   implicit none
 
-  integer, parameter    :: n = 100
-  integer               :: i, s1, s2, vs1, vs2
+  integer, parameter    :: n = 100, n2 = 1000, chunksize = 10
+  integer               :: i, gs1, gs2, ws1, ws2, vs1, vs2, cs1, cs2, hs1, hs2
+  integer               :: j, red, vred
 
-  s1 = 0
-  s2 = 0
+  gs1 = 0
+  gs2 = 0
+  ws1 = 0
+  ws2 = 0
   vs1 = 0
   vs2 = 0
+  cs1 = 0
+  cs2 = 0
+  hs1 = 0
+  hs2 = 0
 
-  !$acc parallel vector_length (32) copy(s1, s2)
-  !$acc loop reduction(+:s1, s2)
+  !$acc parallel num_gangs (1000) copy(gs1, gs2)
+  !$acc loop reduction(+:gs1, gs2) gang
   do i = 1, n
-     s1 = s1 + 1
-     s2 = s2 + 2
+     gs1 = gs1 + 1
+     gs2 = gs2 + 2
   end do
   !$acc end parallel
 
-  ! Verify the results
+  !$acc parallel num_workers (4) vector_length (32) copy(ws1, ws2)
+  !$acc loop reduction(+:ws1, ws2) worker
+  do i = 1, n
+     ws1 = ws1 + 1
+     ws2 = ws2 + 2
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length (32) copy(vs1, vs2)
+  !$acc loop reduction(+:vs1, vs2) vector
   do i = 1, n
      vs1 = vs1 + 1
      vs2 = vs2 + 2
   end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(8) num_workers(4) vector_length(32) copy(cs1, cs2)
+  !$acc loop reduction(+:cs1, cs2) gang worker vector
+  do i = 1, n
+     cs1 = cs1 + 1
+     cs2 = cs2 + 2
+  end do
+  !$acc end parallel
+
+  ! Verify the results on the host
+  do i = 1, n
+     hs1 = hs1 + 1
+     hs2 = hs2 + 2
+  end do
+
+  if (gs1 .ne. hs1) call abort ()
+  if (gs2 .ne. hs2) call abort ()
+
+  if (ws1 .ne. hs1) call abort ()
+  if (ws2 .ne. hs2) call abort ()
+
+  if (vs1 .ne. hs1) call abort ()
+  if (vs2 .ne. hs2) call abort ()
+
+  if (cs1 .ne. hs1) call abort ()
+  if (cs2 .ne. hs2) call abort ()
+
+  ! Nested reductions.
+
+  red = 0
+  vred = 0
+
+  !$acc parallel num_gangs(10) vector_length(32) copy(red)
+  !$acc loop reduction(+:red) gang
+  do i = 1, n/chunksize
+     !$acc loop reduction(+:red) vector
+     do j = 1, chunksize
+        red = red + chunksize
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 1, n/chunksize
+     do j = 1, chunksize
+        vred = vred + chunksize
+     end do
+  end do
 
-  if (s1.ne.vs1) call abort ()
-  if (s2.ne.vs2) call abort ()
+  if (red .ne. vred) call abort ()
 end program reduction
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
new file mode 100644
index 0000000..8ec36ad
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
@@ -0,0 +1,88 @@
+! { dg-do run }
+! { dg-additional-options "-w" }
+
+! subroutine reduction with private and firstprivate variables
+
+program reduction
+  integer, parameter    :: n = 100
+  integer               :: i, j, vsum, cs, arr(n)
+
+  call redsub_private (cs, n, arr)
+  call redsub_bogus (cs, n)
+  call redsub_combined (cs, n, arr)
+
+  vsum = 0
+
+  ! Verify the results
+  do i = 1, n
+     vsum = i
+     do j = 1, n
+        vsum = vsum + 1;
+     end do
+     if (vsum .ne. arr(i)) call abort ()
+  end do
+end program reduction
+
+! This subroutine tests a reduction with an explicit private variable.
+
+subroutine redsub_private(sum, n, arr)
+  integer :: sum, n, arr(n)
+  integer :: i, j, v
+
+  !$acc parallel copyout (arr)
+  !$acc loop gang private (v)
+  do j = 1, n
+     v = j
+
+     !$acc loop vector reduction (+:v)
+     do i = 1, 100
+        v = v + 1
+     end do
+
+     arr(j) = v
+  end do
+  !$acc end parallel
+
+  ! verify the results
+  do i = 1, 10
+     if (arr(i) .ne. 100+i) call abort ()
+  end do
+end subroutine redsub_private
+
+
+! Bogus reduction on an impliclitly firstprivate variable.  The results do
+! survive the parallel region.  The goal here is to ensure that gfortran
+! doesn't ICE.
+
+subroutine redsub_bogus(sum, n)
+  integer :: sum, n, arr(n)
+  integer :: i
+
+  !$acc parallel
+  !$acc loop gang worker vector reduction (+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+end subroutine redsub_bogus
+
+! This reduction involving a firstprivate variable yields legitimate results.
+
+subroutine redsub_combined(sum, n, arr)
+  integer :: sum, n, arr(n)
+  integer :: i, j
+
+  !$acc parallel copy (arr)
+  !$acc loop gang
+  do i = 1, n
+     sum = i;
+
+     !$acc loop reduction(+:sum)
+     do j = 1, n
+        sum = sum + 1
+     end do
+
+     arr(i) = sum
+  end do
+  !$acc end parallel
+end subroutine redsub_combined

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-02-09 16:17   ` Cesar Philippidis
@ 2016-02-22 15:34     ` Cesar Philippidis
  2016-02-22 16:23       ` Nathan Sidwell
  2016-04-06  1:54       ` Cesar Philippidis
  0 siblings, 2 replies; 19+ messages in thread
From: Cesar Philippidis @ 2016-02-22 15:34 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches, Jakub Jelinek, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]

Ping. This patch still needs a review.

Cesar

On 02/09/2016 08:17 AM, Cesar Philippidis wrote:
> On 02/09/2016 07:33 AM, Nathan Sidwell wrote:
>> While I've not looked at the rest of the patch, this bit stood out:
>>
>>> +static bool
>>> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
>>> +{
>>> +  if (!is_oacc_parallel (ctx))
>>> +    return false;
>>> +
>>> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
>>> +
>>> +  /* Don't install a local copy of the decl if it used
>>> +     inside a acc parallel reduction.  */
>>
>> ^^ comment is misleading -- this routine's not installing anything
>>
>>> +  if (is_oacc_parallel (ctx))
>>
>> ^^ already checked above.
>>
>>> +    for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>>> +      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>>> +      && OMP_CLAUSE_DECL (c) == var)
>>> +    return true;
>>> +
>>> +  return false;
>>> +}
>>> +
> 
> Thanks for catching that. Those are artifacts from when this code used
> to be located exclusively in scan_sharing_clauses. I've updated the
> patch with those changes.
> 
> Cesar
> 


[-- Attachment #2: trunk-reductions-20160209.diff --]
[-- Type: text/x-patch, Size: 132984 bytes --]

2016-02-09  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d41688b..592e64b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -308,6 +308,25 @@ is_oacc_kernels (omp_context *ctx)
 	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
 }
 
+/* Return true if CTX corresponds to an oacc parallel region and if
+   VAR is used in a reduction.  */
+
+static bool
+is_oacc_parallel_reduction (tree var, omp_context *ctx)
+{
+  if (!is_oacc_parallel (ctx))
+    return false;
+
+  tree clauses = gimple_omp_target_clauses (ctx->stmt);
+
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+	&& OMP_CLAUSE_DECL (c) == var)
+      return true;
+
+  return false;
+}
+
 /* If DECL is the artificial dummy VAR_DECL created for non-static
    data member privatization, return the underlying "this" parameter,
    otherwise return NULL.  */
@@ -2121,7 +2140,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  else
 		    install_var_field (decl, true, 3, ctx,
 				       base_pointers_restrict);
-		  if (is_gimple_omp_offloaded (ctx->stmt))
+		  if (is_gimple_omp_offloaded (ctx->stmt)
+		      && !is_oacc_parallel_reduction (decl, ctx))
 		    install_var_local (decl, ctx);
 		}
 	    }
@@ -4821,7 +4841,7 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  gimplify_assign (ptr, x, ilist);
 		}
 	    }
-	  else if (is_reference (var) && !is_oacc_parallel (ctx))
+	  else if (is_reference (var))
 	    {
 	      /* For references that are being privatized for Fortran,
 		 allocate new backing storage for the new pointer
@@ -5557,7 +5577,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
-	tree incoming, outgoing;
+	tree incoming, outgoing, v1, v2, v3;
+	bool is_private = false;
 
 	enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
 	if (rcode == MINUS_EXPR)
@@ -5570,7 +5591,6 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
 	if (!var)
 	  var = orig;
-	gcc_assert (!is_reference (var));
 
 	incoming = outgoing = var;
 	
@@ -5606,22 +5626,38 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		for (; cls;  cls = OMP_CLAUSE_CHAIN (cls))
 		  if (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_REDUCTION
 		      && orig == OMP_CLAUSE_DECL (cls))
-		    goto has_outer_reduction;
+		    {
+		      incoming = outgoing = lookup_decl (orig, probe);
+		      goto has_outer_reduction;
+		    }
+		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
+			    || OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			   && orig == OMP_CLAUSE_DECL (cls))
+		    {
+		      is_private = true;
+		      goto do_lookup;
+		    }
 	      }
 
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
-		&& maybe_lookup_field (orig, outer))
+		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
 		ref_to_res = build_receiver_ref (orig, false, outer);
 		if (is_reference (orig))
 		  ref_to_res = build_simple_mem_ref (ref_to_res);
 
+		tree type = TREE_TYPE (var);
+		if (POINTER_TYPE_P (type))
+		  type = TREE_TYPE (type);
+
 		outgoing = var;
-		incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+		incoming = omp_reduction_init_op (loc, rcode, type);
 	      }
+	    else if (ctx->outer)
+	      incoming = outgoing = lookup_decl (orig, ctx->outer);
 	    else
 	      incoming = outgoing = orig;
 	      
@@ -5631,6 +5667,37 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	if (!ref_to_res)
 	  ref_to_res = integer_zero_node;
 
+        if (is_reference (orig))
+	  {
+	    tree type = TREE_TYPE (var);
+	    const char *id = IDENTIFIER_POINTER (DECL_NAME (var));
+
+	    if (!inner)
+	      {
+		tree x = create_tmp_var (TREE_TYPE (type), id);
+		gimplify_assign (var, build_fold_addr_expr (x), fork_seq);
+	      }
+
+	    v1 = create_tmp_var (type, id);
+	    v2 = create_tmp_var (type, id);
+	    v3 = create_tmp_var (type, id);
+
+	    gimplify_assign (v1, var, fork_seq);
+	    gimplify_assign (v2, var, fork_seq);
+	    gimplify_assign (v3, var, fork_seq);
+
+	    var = build_simple_mem_ref (var);
+	    v1 = build_simple_mem_ref (v1);
+	    v2 = build_simple_mem_ref (v2);
+	    v3 = build_simple_mem_ref (v3);
+	    outgoing = build_simple_mem_ref (outgoing);
+
+	    if (TREE_CODE (incoming) != INTEGER_CST)
+	      incoming = build_simple_mem_ref (incoming);
+	  }
+	else
+	  v1 = v2 = v3 = var;
+
 	/* Determine position in reduction buffer, which may be used
 	   by target.  */
 	enum machine_mode mode = TYPE_MODE (TREE_TYPE (var));
@@ -5660,20 +5727,20 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, init_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v1, level, op, off);
 	tree fini_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, fini_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v2, level, op, off);
 	tree teardown_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, teardown_code,
-					  ref_to_res, var, level, op, off);
+					  ref_to_res, v3, level, op, off);
 
-	gimplify_assign (var, setup_call, &before_fork);
-	gimplify_assign (var, init_call, &after_fork);
-	gimplify_assign (var, fini_call, &before_join);
+	gimplify_assign (v1, setup_call, &before_fork);
+	gimplify_assign (v2, init_call, &after_fork);
+	gimplify_assign (v3, fini_call, &before_join);
 	gimplify_assign (outgoing, teardown_call, &after_join);
       }
 
@@ -5915,9 +5982,6 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	}
     }
 
-  if (is_gimple_omp_oacc (ctx->stmt))
-    return;
-
   stmt = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ATOMIC_START),
 			    0);
   gimple_seq_add_stmt (stmt_seqp, stmt);
@@ -15804,7 +15868,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (!maybe_lookup_field (var, ctx))
 	  continue;
 
-	if (offloaded)
+	/* Don't remap oacc parallel reduction variables, because the
+	   intermediate result must be local to each gang.  */
+	if (offloaded && !is_oacc_parallel_reduction (var, ctx))
 	  {
 	    x = build_receiver_ref (var, true, ctx);
 	    tree new_var = lookup_decl (var, ctx);
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
index de97125..59cb6f4 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -1,70 +1,66 @@
-/* { dg-require-effective-target alloca } */
 /* Integer reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   int result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   result = 0;
-//   vresult = 0;
-// 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-//
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&:result)
+#pragma acc loop vector reduction (&:result)
   for (i = 0; i < n; i++)
     result &= array[i];
 
   /* '|' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (|:result)
+#pragma acc loop vector reduction (|:result)
   for (i = 0; i < n; i++)
     result |= array[i];
 
   /* '^' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (^:result)
+#pragma acc loop vector reduction (^:result)
   for (i = 0; i < n; i++)
     result ^= array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-2.c b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 2964236..4889241 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -1,49 +1,48 @@
-/* { dg-require-effective-target alloca } */
 /* float reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   float result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-3.c b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
index 34c51c2..b19224e2 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -1,49 +1,48 @@
-/* { dg-require-effective-target alloca } */
 /* double reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-4.c b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
index 328c0d4..88d7f70 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -1,51 +1,54 @@
-/* { dg-require-effective-target alloca } */
 /* complex reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   __complex__ double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
-  /* Needs support for complex multiplication.  */
-
-//   /* '*' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (*:result)
-//   for (i = 0; i < n; i++)
-//     result *= array[i];
-//
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
+
+  /* 'max' reductions.  */
+#if 0
+  // error: 'result' has invalid type for 'reduction(max)'
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+#endif
+
+  /* 'min' reductions.  */
+#if 0
+  // error: 'result' has invalid type for 'reduction(min)'
+#pragma acc parallel vector_length (vl)
+#pragma acc loop vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
+#endif
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (__real__(result) > __real__(array[i]));
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (__real__(result) > __real__(array[i]));
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
new file mode 100644
index 0000000..8341053
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
@@ -0,0 +1,202 @@
+int i;
+
+int main(void)
+{
+  int j, v;
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyin (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyout (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copy (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) create (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copy (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_create (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+
+#pragma acc data copyin (i, j)
+  {
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present (i, j)
+    {
+      if (i != -1 || j != -2)
+	abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+	abort ();
+      v = 1;
+    }
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+
+#pragma acc data copyin(i, j)
+  {
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v)
+    {
+      if (i != -1 || j != -2)
+	abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+	abort ();
+      v = 1;
+    }
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c
new file mode 100644
index 0000000..640d827
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c
@@ -0,0 +1,13 @@
+
+/* { dg-additional-options "-O2 -fopenacc-dim=16:16" } */
+
+#include "loop-default.h"
+#include <stdlib.h>
+
+int main ()
+{
+  /* Environment should be ignored.  */
+  setenv ("GOMP_OPENACC_DIM", "8:8",  1);
+  
+  return test_1 (16, 16, 32);
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h
new file mode 100644
index 0000000..55de04b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-default.h
@@ -0,0 +1,125 @@
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <stdio.h>
+
+#pragma acc routine
+static int __attribute__ ((noinline)) coord ()
+{
+  int res = 0;
+
+  if (acc_on_device (acc_device_nvidia))
+    {
+      int g = 0, w = 0, v = 0;
+
+      __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+      __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+      __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+      res = (1 << 24) | (g << 16) | (w << 8) | v;
+    }
+  return res;
+}
+
+
+int check (const int *ary, int size, int gp, int wp, int vp)
+{
+  int exit = 0;
+  int ix;
+  int *gangs = (int *)alloca (gp * sizeof (int));
+  int *workers = (int *)alloca (wp * sizeof (int));
+  int *vectors = (int *)alloca (vp * sizeof (int));
+  int offloaded = 0;
+  
+  memset (gangs, 0, gp * sizeof (int));
+  memset (workers, 0, wp * sizeof (int));
+  memset (vectors, 0, vp * sizeof (int));
+  
+  for (ix = 0; ix < size; ix++)
+    {
+      int g = (ary[ix] >> 16) & 0xff;
+      int w = (ary[ix] >> 8) & 0xff;
+      int v = (ary[ix] >> 0) & 0xff;
+
+      if (g >= gp || w >= wp || v >= vp)
+	{
+	  printf ("unexpected cpu %#x used\n", ary[ix]);
+	  exit = 1;
+	}
+      else
+	{
+	  vectors[v]++;
+	  workers[w]++;
+	  gangs[g]++;
+	}
+      offloaded += ary[ix] >> 24;
+    }
+
+  if (!offloaded)
+    return 0;
+
+  if (offloaded != size)
+    {
+      printf ("offloaded %d times,  expected %d\n", offloaded, size);
+      return 1;
+    }
+
+  for (ix = 0; ix < gp; ix++)
+    if (gangs[ix] != gangs[0])
+      {
+	printf ("gang %d not used %d times\n", ix, gangs[0]);
+	exit = 1;
+      }
+  
+  for (ix = 0; ix < wp; ix++)
+    if (workers[ix] != workers[0])
+      {
+	printf ("worker %d not used %d times\n", ix, workers[0]);
+	exit = 1;
+      }
+  
+  for (ix = 0; ix < vp; ix++)
+    if (vectors[ix] != vectors[0])
+      {
+	printf ("vector %d not used %d times\n", ix, vectors[0]);
+	exit = 1;
+      }
+  
+  return exit;
+}
+
+#define N (32 *32*32)
+
+int test_1 (int gp, int wp, int vp)
+{
+  int ary[N];
+  int exit = 0;
+  
+#pragma acc parallel copyout (ary)
+  {
+#pragma acc loop gang (static:1)
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, gp, 1, 1);
+
+#pragma  acc parallel copyout (ary)
+  {
+#pragma acc loop worker
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, 1, wp, 1);
+
+#pragma  acc parallel copyout (ary)
+  {
+#pragma acc loop vector
+    for (int ix = 0; ix < N; ix++)
+      ary[ix] = coord ();
+  }
+
+  exit |= check (ary, N, 1, 1, vp);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
index 23c2a75..7afb89b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
index 1a93db3..db83692 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
index c14bddd..129a8c8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
index 706d0d8..fadf7d5 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
index a073ac8..68d3d7a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
new file mode 100644
index 0000000..55ab3c9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
@@ -0,0 +1,45 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, non-private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  res = hres = 1;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(*:res)
+    for (i = 0; i < 12; i++)
+      res *= arr[i];
+  }
+
+  for (i = 0; i < 12; i++)
+    hres *= arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
new file mode 100644
index 0000000..d4341e9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
new file mode 100644
index 0000000..2e5668b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs and workers, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
new file mode 100644
index 0000000..d610373
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
@@ -0,0 +1,28 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
new file mode 100644
index 0000000..ea5c151
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
@@ -0,0 +1,34 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+    /* "res" is non-private, and is not available until after the parallel
+       region.  */
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
new file mode 100644
index 0000000..0056f3c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
@@ -0,0 +1,33 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  double arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copyin(arr) copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
new file mode 100644
index 0000000..e69d0ec
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
@@ -0,0 +1,55 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (gangs, workers and vectors, multiple
+   non-private reduction variables, float type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, mres = 0, hres = 0, hmres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res, mres)
+  {
+    #pragma acc loop gang reduction(+:res) reduction(max:mres)
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + i];
+	    if (arr[j * 1024 + i] > mres)
+	      mres = arr[j * 1024 + i];
+	  }
+
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + (1023 - i)];
+	    if (arr[j * 1024 + (1023 - i)] > mres)
+	      mres = arr[j * 1024 + (1023 - i)];
+	  }
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+	if (arr[j * 1024 + i] > hmres)
+	  hmres = arr[j * 1024 + i];
+	if (arr[j * 1024 + (1023 - i)] > hmres)
+	  hmres = arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+  assert (mres == hmres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
new file mode 100644
index 0000000..dd181ef
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
new file mode 100644
index 0000000..15f0053
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
@@ -0,0 +1,41 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (vector reduction in
+   gang-partitioned/worker-partitioned mode, private reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, k;
+  double ina[1024], inb[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      {
+        ina[j * 32 + i] = (i == j) ? 2.0 : 0.0;
+	inb[j * 32 + i] = (double) (i + j);
+      }
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(acc) copyin(ina, inb) copyout(out)
+  {
+    #pragma acc loop gang worker
+    for (k = 0; k < 32; k++)
+      for (j = 0; j < 32; j++)
+        {
+	  acc = 0;
+
+	  #pragma acc loop vector reduction(+:acc)
+	  for (i = 0; i < 32; i++)
+	    acc += ina[k * 32 + i] * inb[i * 32 + j];
+
+	  out[k * 32 + j] = acc;
+	}
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      assert (out[j * 32 + i] == (i + j) * 2);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
new file mode 100644
index 0000000..4864acd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "-w" } */
+
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
new file mode 100644
index 0000000..2765908
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
@@ -0,0 +1,41 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
new file mode 100644
index 0000000..c30b0e7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
@@ -0,0 +1,45 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = j;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 1023; i >= 0; i--)
+	  res += arr[j * 1024 + i];
+
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = j;
+      
+      for (i = 0; i < 1024; i++)
+	hres += arr[j * 1024 + i] * 2;
+
+      assert (out[j] == hres);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
new file mode 100644
index 0000000..b5e28fb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable: gang-redundant mode).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i ^ 33;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyin(arr) copyout(out)
+  {
+    /* Private variables aren't initialized by default in openacc.  */
+    res = 0;
+
+    /* "res" should be available at the end of the following loop (and should
+       have the same value redundantly in each gang).  */
+    #pragma acc loop worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+    
+    #pragma acc loop gang (static: 1)
+    for (i = 0; i < 32; i++)
+      out[i] = res;
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  for (i = 0; i < 32; i++)
+    assert (out[i] == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
index 539e41d..28c6d0b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -w" } */
 
 #include <stdio.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
new file mode 100644
index 0000000..5e82e1d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (worker and
+   vector-partitioned loops individually in gang-partitioned mode, int
+   type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) copy(res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 1023; i >= 0; i--)
+	  res += arr[j * 1024 + i];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      hres += arr[j * 1024 + i] * 2;
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
new file mode 100644
index 0000000..a7a75a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
@@ -0,0 +1,40 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   in gang-partitioned mode, int type with XOR).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       reduction(^:res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(^:res)
+	for (i = 0; i < 1024; i++)
+	  res ^= arr[j * 1024 + i];
+
+	#pragma acc loop worker vector reduction(^:res)
+	for (i = 0; i < 1024; i++)
+	  res ^= arr[j * 1024 + (1023 - i)];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres ^= arr[j * 1024 + i];
+	hres ^= arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
new file mode 100644
index 0000000..8d85fed
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
@@ -0,0 +1,42 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   together in gang-partitioned mode, float type).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) copy(res)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + (1023 - i)];
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
new file mode 100644
index 0000000..1904b4a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
@@ -0,0 +1,55 @@
+#include <assert.h>
+
+/* Test of reduction on both parallel and loop directives (workers and vectors
+   together in gang-partitioned mode, float type, multiple reductions).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, mres = 0, hres = 0, hmres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+    reduction(+:res) reduction(max:mres) copy(res, mres)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + i];
+	    if (arr[j * 1024 + i] > mres)
+	      mres = arr[j * 1024 + i];
+	  }
+
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + (1023 - i)];
+	    if (arr[j * 1024 + (1023 - i)] > mres)
+	      mres = arr[j * 1024 + (1023 - i)];
+	  }
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+	if (arr[j * 1024 + i] > hmres)
+	  hmres = arr[j * 1024 + i];
+	if (arr[j * 1024 + (1023 - i)] > hmres)
+	  hmres = arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+  assert (mres == hmres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
index dceac39..a88b60f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
@@ -1,40 +1,54 @@
+/* { dg-additional-options "-w" } */
+
 #include <assert.h>
 
+/* Test of reduction on parallel directive.  */
+
+#define ACTUAL_GANGS 256
+
 int
 main (int argc, char *argv[])
 {
-  int res, res2 = 0;
+  int res, res1 = 0, res2 = 0;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 256
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(+:res1) copy(res2, res1)
   {
+    res1 += 5;
+
     #pragma acc atomic
     res2 += 5;
   }
   res = GANGS * 5;
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
-  res = res2 = 1;
+  res = res1 = res2 = 1;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 8
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(*:res1) copy(res1, res2)
   {
+    res1 *= 5;
+
     #pragma acc atomic
     res2 *= 5;
   }
   for (int i = 0; i < GANGS; ++i)
     res *= 5;
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
index bd5715c..911b76c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
@@ -1,18 +1,25 @@
+/* { dg-additional-options "-w" } */
+
 #include <assert.h>
 #include <openacc.h>
 
+/* Test of reduction on parallel directive (with async).  */
+
 int
 main (int argc, char *argv[])
 {
-  int res, res2 = 0;
+  int res, res1 = 0, res2 = 0;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 256
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2) async(1)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(+:res1) copy(res1, res2) async(1)
   {
+    res1 += 5;
+
     #pragma acc atomic
     res2 += 5;
   }
@@ -20,18 +27,22 @@ main (int argc, char *argv[])
 
   acc_wait (1);
 
+  assert (res == res1);
   assert (res == res2);
 #undef GANGS
 
-  res = res2 = 1;
+  res = res1 = res2 = 1;
 
 #if defined(ACC_DEVICE_TYPE_host)
 # define GANGS 1
 #else
 # define GANGS 8
 #endif
-  #pragma acc parallel num_gangs(GANGS) copy(res2) async(1)
+  #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
+    reduction(*:res1) copy(res1, res2) async(1)
   {
+    res1 *= 5;
+
     #pragma acc atomic
     res2 *= 5;
   }
@@ -40,6 +51,7 @@ main (int argc, char *argv[])
 
   acc_wait (1);
 
+  assert (res == res1);
   assert (res == res2);
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
index fd9df33..9a411fe 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
@@ -2,205 +2,5 @@
 
 #include <stdlib.h>
 
-int i;
-
-int main(void)
-{
-  int j, v;
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-
-#pragma acc data copyin (i, j)
-  {
-#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j)
-    {
-      if (i != -1 || j != -2)
-        abort ();
-      i = 2;
-      j = 1;
-      if (i != 2 || j != 1)
-        abort ();
-      v = 1;
-    }
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-
-#pragma acc data copyin(i, j)
-  {
-#pragma acc parallel /* copyout */ present_or_copyout (v)
-    {
-      if (i != -1 || j != -2)
-        abort ();
-      i = 2;
-      j = 1;
-      if (i != 2 || j != 1)
-        abort ();
-      v = 1;
-    }
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  return 0;
-}
+#define EXEC_DIRECTIVE parallel
+#include "data-clauses.h"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index 39357ce..ecf78c7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -5,12 +5,20 @@
 
 int main ()
 {
+  int dummy[10];
+  
 #pragma acc parallel num_workers (2<<20) /* { dg-error "using num_workers" } */
   {
+#pragma acc loop worker
+    for (int  i = 0; i < 10; i++)
+      dummy[i] = i;
   }
 
 #pragma acc parallel vector_length (2<<20) /* { dg-error "using vector_length" } */
   {
+#pragma acc loop vector
+    for (int  i = 0; i < 10; i++)
+      dummy[i] = i;
   }
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
new file mode 100644
index 0000000..b2c60e5
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+#include <stdlib.h>
+#include <openacc.h>
+
+#define N 10
+
+int
+main ()
+{
+  int s1 = 0, s2 = 0;
+  int i;
+  int dummy = 0;
+
+#pragma acc data copy (dummy)
+  {
+#pragma acc parallel num_gangs (N) reduction (+:s1) copy(s1)
+    {
+      s1++;
+    }
+  }
+
+  if (acc_get_device_type () != acc_device_nvidia)
+    {
+      if (s1 != 1)
+	abort ();
+    }
+  else
+    {
+      if (s1 != N)
+	abort ();
+    }
+
+  s1 = 0;
+  s2 = 0;
+
+#pragma acc parallel num_gangs (10) reduction (+:s1, s2) copy(s1, s2)
+  {
+    s1++;
+    s2 += N;
+  }
+
+  if (acc_get_device_type () != acc_device_nvidia)
+    {
+      if (s1 != 1)
+	abort ();
+      if (s2 != N)
+	abort ();
+    }
+  else
+    {
+      if (s1 != N)
+	abort ();
+      if (s2 != N*N)
+	abort ();
+    }
+
+  s1 = 0;
+
+#pragma acc parallel num_gangs (10) reduction (+:s1) copy(s1)
+  {
+#pragma acc loop gang reduction (+:s1)
+    for (i = 0; i < 10; i++)
+      s1++;
+  }
+
+  if (s1 != N)
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
index e557931..10eb278 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
@@ -1,46 +1,59 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* Integer reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-
-#define vl 32
-
-#define DO_PRAGMA(x) _Pragma (#x)
-
-#define check_reduction_op(type, op, init, b)	\
-  {						\
-    type res, vres;				\
-    res = (init);				\
-    DO_PRAGMA (acc parallel vector_length (vl) copy(res))	\
-DO_PRAGMA (acc loop reduction (op:res))\
-    for (i = 0; i < n; i++)			\
-      res = res op (b);				\
-						\
-    vres = (init);				\
-    for (i = 0; i < n; i++)			\
-      vres = vres op (b);			\
-						\
-    if (res != vres)				\
-      abort ();					\
-  }
+#include "reduction.h"
+
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
 static void
-test_reductions_int (void)
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 100;
   int i;
   int array[n];
 
   for (i = 0; i < n; i++)
-    array[i] = i;
-
-  check_reduction_op (int, +, 0, array[i]);
-  check_reduction_op (int, *, 1, array[i]);
-  check_reduction_op (int, &, -1, array[i]);
-  check_reduction_op (int, |, 0, array[i]);
-  check_reduction_op (int, ^, 0, array[i]);
+    array[i] = i+1;
+
+  /* Gang reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, *, 1, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, &, -1, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, |, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, ^, 0, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (int, *, 1, array[i], num_workers (nw), worker);
+  check_reduction_op (int, &, -1, array[i], num_workers (nw), worker);
+  check_reduction_op (int, |, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (int, ^, 0, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (int, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (int, *, 1, array[i], vector_length (vl), vector);
+  check_reduction_op (int, &, -1, array[i], vector_length (vl), vector);
+  check_reduction_op (int, |, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (int, ^, 0, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, *, 1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, &, -1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, |, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (int, ^, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
 }
 
 static void
@@ -55,29 +68,31 @@ test_reductions_bool (void)
     array[i] = i;
 
   cmp_val = 5;
-  check_reduction_op (bool, &&, true, (cmp_val > array[i]));
-  check_reduction_op (bool, ||, false, (cmp_val > array[i]));
-}
 
-#define check_reduction_macro(type, op, init, b)	\
-  {							\
-    type res, vres;					\
-    res = (init);					\
-DO_PRAGMA (acc parallel vector_length (vl) copy(res))\
-DO_PRAGMA (acc loop reduction (op:res))\
-    for (i = 0; i < n; i++)				\
-      res = op (res, (b));				\
-							\
-    vres = (init);					\
-    for (i = 0; i < n; i++)				\
-      vres = op (vres, (b));				\
-							\
-    if (res != vres)					\
-      abort ();						\
-  }
-
-#define max(a, b) (((a) > (b)) ? (a) : (b))
-#define min(a, b) (((a) < (b)) ? (a) : (b))
+  /* Gang reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_gangs (ng),
+		      gang);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_gangs (ng),
+		      gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_workers (nw),
+		      worker);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_workers (nw),
+		      worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), vector_length (vl),
+		      vector);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), vector_length (vl),
+		      vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (int, &&, 1, (cmp_val > array[i]), num_gangs (ng)
+		      num_workers (nw) vector_length (vl), gang worker vector);
+  check_reduction_op (int, ||, 0, (cmp_val > array[i]), num_gangs (ng)
+		      num_workers (nw) vector_length (vl), gang worker vector);
+}
 
 static void
 test_reductions_minmax (void)
@@ -89,14 +104,32 @@ test_reductions_minmax (void)
   for (i = 0; i < n; i++)
     array[i] = i;
 
-  check_reduction_macro (int, min, n + 1, array[i]);
-  check_reduction_macro (int, max, -1, array[i]);
+  /* Gang reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (int, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_workers (nw), worker);
+  check_reduction_macro (int, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (int, max, -1, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (int, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (int, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
 }
 
 int
 main (void)
 {
-  test_reductions_int ();
+  test_reductions ();
   test_reductions_bool ();
   test_reductions_minmax ();
   return 0;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
index 8a0b0d6..7cb9497 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
@@ -1,124 +1,83 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* float reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 100;
   int i;
-  float vresult, result, array[n];
-  bool lvresult, lresult;
+  float array[n];
 
   for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
+    array[i] = i+1;
 
-  /* '+' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (+:result)
-  for (i = 0; i < n; i++)
-    result += array[i];
+  /* Gang reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (float, *, 1, array[i], num_gangs (ng), gang);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult += array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* '*' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (*:result)
-  for (i = 0; i < n; i++)
-    result *= array[i];
+  /* Worker reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (float, *, 1, array[i], num_workers (nw), worker);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult *= array[i];
-
-  if (fabs(result - vresult) > .0001)
-    abort ();
-  result = 0;
-  vresult = 0;
-
-  /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (max:result)
-  for (i = 0; i < n; i++)
-    result = result > array[i] ? result : array[i];
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult > array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (min:result)
-  for (i = 0; i < n; i++)
-    result = result < array[i] ? result : array[i];
+  /* Vector reductions.  */
+  check_reduction_op (float, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (float, *, 1, array[i], vector_length (vl), vector);
 
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult < array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
-
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (result > array[i]);
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (result > array[i]);
-
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
+  /* Combined reductions.  */
+  check_reduction_op (float, +, 0, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (float, *, 1, array[i], num_gangs (ng) num_workers (nw)
+		      vector_length (vl), gang worker vector);
+}
 
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (result > array[i]);
+static void
+test_reductions_minmax (void)
+{
+  const int n = 1000;
+  int i;
+  float array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (result > array[i]);
+    array[i] = i;
 
-  if (lresult != lvresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (float, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_workers (nw),
+			 worker);
+  check_reduction_macro (float, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (float, max, -1, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (float, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (float, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw)vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
+  test_reductions_minmax ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
index a233e29..1b948be 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-3.c
@@ -1,93 +1,84 @@
 /* { dg-do run } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* double reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 10;
   int i;
-  double vresult, result, array[n];
-  bool lvresult, lresult;
-
-  for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
-
-  /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (max:result)
-  for (i = 0; i < n; i++)
-    result = result > array[i] ? result : array[i];
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    vresult = vresult > array[i] ? vresult : array[i];
-
-  if (result != vresult)
-    abort ();
-
-  result = 0;
-  vresult = 0;
-
-  /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl) copy(result)
-#pragma acc loop reduction (min:result)
-  for (i = 0; i < n; i++)
-    result = result < array[i] ? result : array[i];
+  double array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    vresult = vresult < array[i] ? vresult : array[i];
+    array[i] = i+1;
 
-  if (result != vresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (double, *, 1, array[i], num_gangs (ng), gang);
 
-  result = 5;
-  vresult = 5;
+  /* Worker reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (double, *, 1, array[i], num_workers (nw), worker);
 
-  lresult = false;
-  lvresult = false;
+  /* Vector reductions.  */
+  check_reduction_op (double, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (double, *, 1, array[i], vector_length (vl), vector);
 
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (result > array[i]);
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (result > array[i]);
-
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
+  /* Combined reductions.  */
+  check_reduction_op (double, +, 0, array[i], num_gangs (ng)  num_workers (nw)
+		      vector_length (vl), gang worker vector);
+  check_reduction_op (double, *, 1, array[i], num_gangs (ng)  num_workers (nw)
+		      vector_length (vl), gang worker vector);
+}
 
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (result > array[i]);
+static void
+test_reductions_minmax (void)
+{
+  const int n = 1000;
+  int i;
+  double array[n];
 
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (result > array[i]);
+    array[i] = i;
 
-  if (lresult != lvresult)
-    abort ();
+  /* Gang reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_gangs (ng), gang);
+  check_reduction_macro (double, max, -1, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_workers (nw),
+			 worker);
+  check_reduction_macro (double, max, -1, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], vector_length (vl),
+			 vector);
+  check_reduction_macro (double, max, -1, array[i], vector_length (vl),
+			 vector);
+
+  /* Combined reductions.  */
+  check_reduction_macro (double, min, n + 1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_macro (double, max, -1, array[i], num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
+  test_reductions_minmax ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
index 59d49c1..79355ed 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
@@ -1,59 +1,56 @@
 /* { dg-do run { target { ! { hppa*-*-hpux* } } } } */
 
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
 /* complex reductions.  */
 
 #include <stdlib.h>
-#include <stdbool.h>
-#include <math.h>
 #include <complex.h>
+#include "reduction.h"
 
-#define vl 32
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
-int
-main(void)
+static void
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 10;
   int i;
-  double _Complex vresult, result, array[n];
-  bool lvresult, lresult;
-
-  for (i = 0; i < n; i++)
-    array[i] = i;
-
-  result = 0;
-  vresult = 0;
-
-  /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (&&:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult && (creal(result) > creal(array[i]));
-
-  /* Verify the reduction.  */
-  for (i = 0; i < n; i++)
-    lvresult = lresult && (creal(result) > creal(array[i]));
+  double _Complex array[n];
 
-  if (lresult != lvresult)
-    abort ();
-
-  result = 5;
-  vresult = 5;
-
-  lresult = false;
-  lvresult = false;
-
-  /* '||' reductions.  */
-#pragma acc parallel vector_length (vl) copy(lresult)
-#pragma acc loop reduction (||:lresult)
-  for (i = 0; i < n; i++)
-    lresult = lresult || (creal(result) > creal(array[i]));
-
-  /* Verify the reduction.  */
   for (i = 0; i < n; i++)
-    lvresult = lresult || (creal(result) > creal(array[i]));
-
-  if (lresult != lvresult)
-    abort ();
+    array[i] = i+1;
+
+  /* Gang reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_gangs (ng), gang);
+  check_reduction_op (double, *, 1, creal (array[i]), num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_workers (nw),
+		      worker);
+  check_reduction_op (double, *, 1, creal (array[i]), num_workers (nw),
+		      worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), vector_length (vl),
+		      vector);
+  check_reduction_op (double, *, 1, creal (array[i]), vector_length (vl),
+		      vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (double, +, 0, creal (array[i]), num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+  check_reduction_op (double, *, 1, creal (array[i]), num_gangs (ng)
+			 num_workers (nw) vector_length (vl), gang worker
+			 vector);
+}
 
+int
+main (void)
+{
+  test_reductions ();
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
index efe8702..46b553a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
@@ -1,32 +1,57 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+/* Ignore vector_length warnings for offloaded (nvptx) targets.  */
+/* { dg-additional-options "-foffload=-w" } */
+
+/* Multiple reductions.  */
+
 #include <stdio.h>
 #include <stdlib.h>
 
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
+
+const int n = 100;
+
+#define DO_PRAGMA(x) _Pragma (#x)
+
+#define check_reduction(gwv_par, gwv_loop)		\
+  {							\
+  s1 = 2; s2 = 5;					\
+DO_PRAGMA (acc parallel gwv_par copy (s1, s2))		\
+DO_PRAGMA (acc loop gwv_loop reduction (+:s1, s2))	\
+    for (i = 0; i < n; i++)				\
+      {							\
+         s1 = s1 + 3;					\
+         s2 = s2 + 5;					\
+      }							\
+							\
+    if (s1 != v1 && s2 != v2)				\
+      abort ();						\
+  }
+
 int
 main (void)
 {
   int s1 = 2, s2 = 5, v1 = 2, v2 = 5;
-  int n = 100;
   int i;
 
-#pragma acc parallel vector_length (32) copy(s1,s2)
-#pragma acc loop reduction (+:s1, s2)
-  for (i = 0; i < n; i++)
-    {
-      s1 = s1 + 3;
-      s2 = s2 + 2;
-    }
-
   for (i = 0; i < n; i++)
     {
       v1 = v1 + 3;
       v2 = v2 + 2;
     }
-  
-  if (s1 != v1)
-    abort ();
-  
-  if (s2 != v2)
-    abort ();
-    
+
+  check_reduction (num_gangs (ng), gang);
+
+  /* Nvptx targets require a vector_length or 32 in to allow spinlocks with
+     gangs.  */
+  check_reduction (num_workers (nw) vector_length (vl), worker);
+  check_reduction (vector_length (vl), vector);
+  check_reduction (num_gangs (ng) num_workers (nw) vector_length (vl), gang
+		   worker vector);
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
new file mode 100644
index 0000000..af30b31
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w" } */
+
+/* Test reductions on explicitly private variables.  */
+
+#include <assert.h>
+
+int
+main ()
+{
+  int i, j, red[10];
+  int v;
+
+  for (i = 0; i < 10; i++)
+    red[i] = -1;
+
+#pragma acc parallel copyout(red)
+  {
+#pragma acc loop gang private(v)
+    for (j = 0; j < 10; j++)
+      {
+	v = j;
+
+#pragma acc loop vector reduction (+:v)
+	for (i = 0; i < 100; i++)
+	  v++;
+
+	red[j] = v;
+      }
+  }
+
+  for (i = 0; i < 10; i++)
+    assert (red[i] == i + 100);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
new file mode 100644
index 0000000..1b3f8d4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
@@ -0,0 +1,43 @@
+#ifndef REDUCTION_H
+#define REDUCTION_H
+
+#define DO_PRAGMA(x) _Pragma (#x)
+
+#define check_reduction_op(type, op, init, b, gwv_par, gwv_loop)	\
+  {									\
+    type res, vres;							\
+    res = (init);							\
+DO_PRAGMA (acc parallel gwv_par copy (res))				\
+DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
+    for (i = 0; i < n; i++)						\
+      res = res op (b);							\
+									\
+    vres = (init);							\
+    for (i = 0; i < n; i++)						\
+      vres = vres op (b);						\
+									\
+    if (res != vres)							\
+      abort ();								\
+  }
+
+#define check_reduction_macro(type, op, init, b, gwv_par, gwv_loop)	\
+  {									\
+    type res, vres;							\
+    res = (init);							\
+    DO_PRAGMA (acc parallel gwv_par copy(res))				\
+DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
+    for (i = 0; i < n; i++)						\
+      res = op (res, (b));						\
+									\
+    vres = (init);							\
+    for (i = 0; i < n; i++)						\
+      vres = op (vres, (b));						\
+									\
+    if (res != vres)							\
+      abort ();								\
+  }
+
+#define max(a, b) (((a) > (b)) ? (a) : (b))
+#define min(a, b) (((a) < (b)) ? (a) : (b))
+
+#endif
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90
new file mode 100644
index 0000000..4c86ada
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-1.f90
@@ -0,0 +1,75 @@
+! Exercise the auto, independent, seq and tile loop clauses inside
+! parallel regions. 
+
+program loops
+  integer, parameter     :: n = 20, c = 10
+  integer                :: i, a(n), b(n)
+
+  a(:) = 0
+  b(:) = 0
+
+  ! COPY
+
+  !$acc parallel copy (a)
+  !$acc loop auto
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     b(i) = i
+  end do
+
+  call check (a, b, n)
+
+  ! COPYOUT
+
+  a(:) = 0
+
+  !$acc parallel copyout (a)
+  !$acc loop independent
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+  call check (a, b, n)
+
+  ! COPYIN
+
+  a(:) = 0
+
+  !$acc parallel copyout (a) copyin (b)
+  !$acc loop seq
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+  ! PRESENT_OR_COPY
+
+  !$acc parallel pcopy (a)
+  !$acc loop tile (*)
+  do i = 1, n
+     a(i) = i
+  end do
+  !$acc end parallel
+
+  call check (a, b, n)
+
+end program loops
+
+subroutine check (a, b, n)
+  integer :: n, a(n), b(n)
+  integer :: i
+
+  do i = 1, n
+     if (a(i) .ne. b(i)) call abort
+  end do
+end subroutine check
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
new file mode 100644
index 0000000..f49ed73
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+! { dg-additional-options "-w" }
+
+program reduction
+  integer, parameter :: n = 10
+  integer sum
+  include "openacc_lib.h"
+
+  sum = 0
+
+  !$acc parallel reduction(+:sum) num_gangs (n) copy(sum)
+  sum = sum + 1
+  !$acc end parallel
+
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     if (sum .ne. n) call abort
+  else
+     if (sum .ne. 1) call abort
+  end if
+
+  ! Test reductions inside subroutines
+
+  sum = 0
+  call redsub (sum, n)
+
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     if (sum .ne. n) call abort
+  else
+     if (sum .ne. 1) call abort
+  end if
+end program reduction
+
+subroutine redsub(sum, n)
+  integer :: sum, n
+
+  !$acc parallel reduction(+:sum) num_gangs (10)  copy(sum)
+  sum = sum + 1
+  !$acc end parallel
+end subroutine redsub
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
index db0a52d..e51509f 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
@@ -1,28 +1,55 @@
 ! { dg-do run }
+! { dg-additional-options "-w" }
 
 ! Integer reductions
 
 program reduction_1
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
-  integer               :: i, vresult, result
-  logical               :: lresult, lvresult
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
+  integer               :: i, vresult, rg, rw, rv, rc
+  logical               :: lrg, lrw, lrv, lrc, lvresult
   integer, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -31,17 +58,46 @@ program reduction_1
      vresult = vresult + array(i)
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 0
-  vresult = 0
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! '*' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
+  vresult = 1
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
   do i = 1, n
-     result = result * array(i)
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -50,17 +106,46 @@ program reduction_1
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
+  do i = 1, n
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel num_gangs(ng) Num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
   do i = 1, n
-     result = max (result, array(i))
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -69,17 +154,46 @@ program reduction_1
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! 'min' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
   do i = 1, n
-     result = min (result, array(i))
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
+  do i = 1, n
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -88,17 +202,46 @@ program reduction_1
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! 'iand' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! 'iand' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(iand:rg) gang
+  do i = 1, n
+     rg = iand (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(iand:rw) worker
+  do i = 1, n
+     rw = iand (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(iand:rv) vector
+  do i = 1, n
+     rv = iand (rv, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(iand:result)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(iand:rc) gang worker vector
   do i = 1, n
-     result = iand (result, array(i))
+     rc = iand (rc, array(i))
   end do
   !$acc end parallel
 
@@ -107,17 +250,46 @@ program reduction_1
      vresult = iand (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! 'ior' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(ior:rg) gang
+  do i = 1, n
+     rg = ior (rg, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(ior:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(ior:rw) worker
   do i = 1, n
-     result = ior (result, array(i))
+     rw = ior (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(ior:rv) gang
+  do i = 1, n
+     rv = ior (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(ior:rc) gang worker vector
+  do i = 1, n
+     rc = ior (rc, array(i))
   end do
   !$acc end parallel
 
@@ -126,17 +298,46 @@ program reduction_1
      vresult = ior (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
-  result = 0
+  !
+  ! 'ieor' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'ieor' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(ieor:rg) gang
+  do i = 1, n
+     rg = ieor (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(ieor:rw) worker
+  do i = 1, n
+     rw = ieor (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(ieor:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(ieor:rv) vector
   do i = 1, n
-     result = ieor (result, array(i))
+     rv = ieor (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(ieor:rc) gang worker vector
+  do i = 1, n
+     rc = ieor (rc, array(i))
   end do
   !$acc end parallel
 
@@ -145,17 +346,46 @@ program reduction_1
      vresult = ieor (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
+  do i = 1, n
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -164,17 +394,46 @@ program reduction_1
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
+
+  !
+  ! '.or.' reductions
+  !
 
-  lresult = .false.
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
+  do i = 1, n
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -183,17 +442,46 @@ program reduction_1
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -202,17 +490,46 @@ program reduction_1
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -221,5 +538,8 @@ program reduction_1
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_1
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
index 3d99668..cd09099 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
@@ -5,26 +5,52 @@
 program reduction_2
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  real, parameter       :: e = .001
-  real                  :: vresult, result
-  logical               :: lresult, lvresult
-  real, dimension (n) :: array
+  real                  :: vresult, rg, rw, rv, rc
+  real, parameter       :: e = 0.001
+  logical               :: lrg, lrw, lrv, lrc, lvresult
+  real, dimension (n)   :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -33,17 +59,46 @@ program reduction_2
      vresult = vresult + array(i)
   end do
 
-  if (abs (result - vresult) .ge. e) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
+
+  !
+  ! '*' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! '*' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
   do i = 1, n
-     result = result * array(i)
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -52,17 +107,46 @@ program reduction_2
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rg = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
   do i = 1, n
-     result = max (result, array(i))
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -71,17 +155,46 @@ program reduction_2
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! 'min' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
   do i = 1, n
-     result = min (result, array(i))
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -90,17 +203,46 @@ program reduction_2
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -109,17 +251,46 @@ program reduction_2
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
-  lresult = .false.
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .false.
+  lrw = .false.
+  lrv = .false.
+  lrc = .false.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -128,17 +299,46 @@ program reduction_2
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -147,17 +347,46 @@ program reduction_2
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -166,5 +395,8 @@ program reduction_2
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_2
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
index d0b590e..a7dbf2b 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
@@ -5,26 +5,52 @@
 program reduction_3
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  double precision, parameter :: e = .001
-  double precision      :: vresult, result
-  logical               :: lresult, lvresult
+  double precision      :: vresult, rg, rw, rv, rc
+  double precision, parameter :: e = 0.001
+  logical               :: lrg, lrw, lrv, lrc, lvresult
   double precision, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
   do i = 1, n
-     result = result + array(i)
+     rw = rw + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+     rc = rc + array(i)
   end do
   !$acc end parallel
 
@@ -33,17 +59,46 @@ program reduction_3
      vresult = vresult + array(i)
   end do
 
-  if (abs (result - vresult) .ge. e) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! '*' reductions
+  !
 
-  result = 1
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-  ! '*' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * array(i)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(*:result)
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
   do i = 1, n
-     result = result * array(i)
+     rw = rw * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * array(i)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * array(i)
   end do
   !$acc end parallel
 
@@ -52,17 +107,46 @@ program reduction_3
      vresult = vresult * array(i)
   end do
 
-  if (result.ne.vresult) call abort
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
+
+  !
+  ! 'max' reductions
+  !
 
-  result = 0
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! 'max' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+     rg = max (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+     rw = max (rw, array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(max:result)
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
   do i = 1, n
-     result = max (result, array(i))
+     rv = max (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+     rc = max (rc, array(i))
   end do
   !$acc end parallel
 
@@ -71,17 +155,46 @@ program reduction_3
      vresult = max (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! 'min' reductions
+  !
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(min:result)
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+     rg = min (rg, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+     rw = min (rw, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
   do i = 1, n
-     result = min (result, array(i))
+     rv = min (rv, array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+     rc = min (rc, array(i))
   end do
   !$acc end parallel
 
@@ -90,17 +203,46 @@ program reduction_3
      vresult = min (vresult, array(i))
   end do
 
-  if (result.ne.vresult) call abort
-
-  result = 1
-  vresult = 1
+  if (abs (rg - vresult) .ge. e) call abort
+  if (abs (rw - vresult) .ge. e) call abort
+  if (abs (rv - vresult) .ge. e) call abort
+  if (abs (rc - vresult) .ge. e) call abort
 
+  !
   ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+     lrg = lrg .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.and.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
   do i = 1, n
-     lresult = lresult .and. (array(i) .ge. 5)
+     lrw = lrw .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+     lrv = lrv .and. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .and. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -109,17 +251,46 @@ program reduction_3
      lvresult = lvresult .and. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
-  lresult = .false.
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .false.
+  lrw = .false.
+  lrv = .false.
+  lrc = .false.
   lvresult = .false.
 
-  ! '.or.' reductions
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+     lrg = lrg .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+     lrw = lrw .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.or.:lresult)
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
   do i = 1, n
-     lresult = lresult .or. (array(i) .ge. 5)
+     lrv = lrv .or. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .or. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -128,17 +299,46 @@ program reduction_3
      lvresult = lvresult .or. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.eqv.:lresult)
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
   do i = 1, n
-     lresult = lresult .eqv. (array(i) .ge. 5)
+     lrw = lrw .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .eqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+     lrc = lrc .eqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -147,17 +347,46 @@ program reduction_3
      lvresult = lvresult .eqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
-
-  lresult = .false.
-  lvresult = .false.
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 
+  !
   ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+     lrg = lrg .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+     lrw = lrw .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+     lrv = lrv .neqv. (array(i) .ge. 5)
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(lresult)
-  !$acc loop reduction(.neqv.:lresult)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
   do i = 1, n
-     lresult = lresult .neqv. (array(i) .ge. 5)
+     lrc = lrc .neqv. (array(i) .ge. 5)
   end do
   !$acc end parallel
 
@@ -166,5 +395,8 @@ program reduction_3
      lvresult = lvresult .neqv. (array(i) .ge. 5)
   end do
 
-  if (result.ne.vresult) call abort
+  if (lrg .neqv. lvresult) call abort
+  if (lrw .neqv. lvresult) call abort
+  if (lrv .neqv. lvresult) call abort
+  if (lrc .neqv. lvresult) call abort
 end program reduction_3
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
index 8c99fdb..c3bdaf6 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90
@@ -5,50 +5,108 @@
 program reduction_4
   implicit none
 
-  integer, parameter    :: n = 10, vl = 32
+  integer, parameter    :: n = 10, ng = 8, nw = 4, vl = 32
   integer               :: i
-  complex               :: vresult, result
+  real                  :: vresult, rg, rw, rv, rc
   complex, dimension (n) :: array
 
   do i = 1, n
      array(i) = i
   end do
 
-  result = 0
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
   vresult = 0
 
-  ! '+' reductions
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+     rg = rg + REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
+  do i = 1, n
+     rw = rw + REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+     rv = rv + REAL(array(i))
+  end do
+  !$acc end parallel
 
-  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
-  !$acc loop reduction(+:result)
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
   do i = 1, n
-     result = result + array(i)
+     rc = rc + REAL(array(i))
   end do
   !$acc end parallel
 
   ! Verify the results
   do i = 1, n
-     vresult = vresult + array(i)
+     vresult = vresult + REAL(array(i))
   end do
 
-  if (result .ne. vresult) call abort
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 
-  result = 1
+  !
+  ! '*' reductions
+  !
+
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
   vresult = 1
 
-!  ! '*' reductions
-!
-!  !$acc parallel vector_length(vl)
-!  !$acc loop reduction(*:result)
-!  do i = 1, n
-!     result = result * array(i)
-!  end do
-!  !$acc end parallel
-!
-!  ! Verify the results
-!  do i = 1, n
-!     vresult = vresult * array(i)
-!  end do
-!
-!  if (result.ne.vresult) call abort
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+     rg = rg * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+     rw = rw * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+     rv = rv * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+     rc = rc * REAL(array(i))
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+     vresult = vresult * REAL(array(i))
+  end do
+
+  if (rg .ne. vresult) call abort
+  if (rw .ne. vresult) call abort
+  if (rv .ne. vresult) call abort
+  if (rc .ne. vresult) call abort
 end program reduction_4
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
index ec13e4e..4210648 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
@@ -1,12 +1,17 @@
 ! { dg-do run }
+! { dg-additional-options "-w" }
 
 ! subroutine reduction
 
 program reduction
   integer, parameter    :: n = 40, c = 10
-  integer               :: i, vsum, sum
+  integer               :: i, vsum, gs, ws, vs, cs, ns
 
-  call redsub (sum, n, c)
+  call redsub_gang (gs, n, c)
+  call redsub_worker (ws, n, c)
+  call redsub_vector (vs, n, c)
+  call redsub_combined (cs, n, c)
+  call redsub_nested (ns, n, c)
 
   vsum = 0
 
@@ -15,21 +20,80 @@ program reduction
      vsum = vsum + c
   end do
 
-  if (sum.ne.vsum) call abort ()
+  if (gs .ne. vsum) call abort ()
+  if (ws .ne. vsum) call abort ()
+  if (vs .ne. vsum) call abort ()
+  if (cs .ne. vsum) call abort ()
+  if (ns .ne. vsum) call abort ()
 end program reduction
 
-subroutine redsub(sum, n, c)
+subroutine redsub_gang(sum, n, c)
   integer :: sum, n, c
 
-  integer :: s
-  s = 0
+  sum = 0
 
-  !$acc parallel vector_length(32) copyin (n, c) copy (s) num_gangs(1)
-  !$acc loop reduction(+:s)
+  !$acc parallel copyin (n, c) num_gangs(n) copy(sum)
+  !$acc loop reduction(+:sum) gang
   do i = 1, n
-     s = s + c
+     sum = sum + c
   end do
   !$acc end parallel
+end subroutine redsub_gang
 
-  sum = s
-end subroutine redsub
+subroutine redsub_worker(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel copyin (n, c) num_workers(4) vector_length (32) copy(sum)
+  !$acc loop reduction(+:sum) worker
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_worker
+
+subroutine redsub_vector(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel copyin (n, c) vector_length(32) copy(sum)
+  !$acc loop reduction(+:sum) vector
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_vector
+
+subroutine redsub_combined(sum, n, c)
+  integer :: sum, n, c
+
+  sum = 0
+
+  !$acc parallel num_gangs (8) num_workers (4) vector_length(32) copy(sum)
+  !$acc loop reduction(+:sum) gang worker vector
+  do i = 1, n
+     sum = sum + c
+  end do
+  !$acc end parallel
+end subroutine redsub_combined
+
+subroutine redsub_nested(sum, n, c)
+  integer :: sum, n, c
+  integer :: ii, jj
+
+  ii = n / 10;
+  jj = 10;
+  sum = 0
+
+  !$acc parallel num_gangs (8) copy(sum)
+  !$acc loop reduction(+:sum) gang
+  do i = 1, ii
+     !$acc loop reduction(+:sum) vector
+     do j = 1, jj
+        sum = sum + c
+     end do
+  end do
+  !$acc end parallel
+end subroutine redsub_nested
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
index 2ff6f5f..f3ed275 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
@@ -1,30 +1,94 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp -w" }
 
 program reduction
   implicit none
 
-  integer, parameter    :: n = 100
-  integer               :: i, s1, s2, vs1, vs2
+  integer, parameter    :: n = 100, n2 = 1000, chunksize = 10
+  integer               :: i, gs1, gs2, ws1, ws2, vs1, vs2, cs1, cs2, hs1, hs2
+  integer               :: j, red, vred
 
-  s1 = 0
-  s2 = 0
+  gs1 = 0
+  gs2 = 0
+  ws1 = 0
+  ws2 = 0
   vs1 = 0
   vs2 = 0
+  cs1 = 0
+  cs2 = 0
+  hs1 = 0
+  hs2 = 0
 
-  !$acc parallel vector_length (32) copy(s1, s2)
-  !$acc loop reduction(+:s1, s2)
+  !$acc parallel num_gangs (1000) copy(gs1, gs2)
+  !$acc loop reduction(+:gs1, gs2) gang
   do i = 1, n
-     s1 = s1 + 1
-     s2 = s2 + 2
+     gs1 = gs1 + 1
+     gs2 = gs2 + 2
   end do
   !$acc end parallel
 
-  ! Verify the results
+  !$acc parallel num_workers (4) vector_length (32) copy(ws1, ws2)
+  !$acc loop reduction(+:ws1, ws2) worker
+  do i = 1, n
+     ws1 = ws1 + 1
+     ws2 = ws2 + 2
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length (32) copy(vs1, vs2)
+  !$acc loop reduction(+:vs1, vs2) vector
   do i = 1, n
      vs1 = vs1 + 1
      vs2 = vs2 + 2
   end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(8) num_workers(4) vector_length(32) copy(cs1, cs2)
+  !$acc loop reduction(+:cs1, cs2) gang worker vector
+  do i = 1, n
+     cs1 = cs1 + 1
+     cs2 = cs2 + 2
+  end do
+  !$acc end parallel
+
+  ! Verify the results on the host
+  do i = 1, n
+     hs1 = hs1 + 1
+     hs2 = hs2 + 2
+  end do
+
+  if (gs1 .ne. hs1) call abort ()
+  if (gs2 .ne. hs2) call abort ()
+
+  if (ws1 .ne. hs1) call abort ()
+  if (ws2 .ne. hs2) call abort ()
+
+  if (vs1 .ne. hs1) call abort ()
+  if (vs2 .ne. hs2) call abort ()
+
+  if (cs1 .ne. hs1) call abort ()
+  if (cs2 .ne. hs2) call abort ()
+
+  ! Nested reductions.
+
+  red = 0
+  vred = 0
+
+  !$acc parallel num_gangs(10) vector_length(32) copy(red)
+  !$acc loop reduction(+:red) gang
+  do i = 1, n/chunksize
+     !$acc loop reduction(+:red) vector
+     do j = 1, chunksize
+        red = red + chunksize
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 1, n/chunksize
+     do j = 1, chunksize
+        vred = vred + chunksize
+     end do
+  end do
 
-  if (s1.ne.vs1) call abort ()
-  if (s2.ne.vs2) call abort ()
+  if (red .ne. vred) call abort ()
 end program reduction
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
new file mode 100644
index 0000000..8ec36ad
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
@@ -0,0 +1,88 @@
+! { dg-do run }
+! { dg-additional-options "-w" }
+
+! subroutine reduction with private and firstprivate variables
+
+program reduction
+  integer, parameter    :: n = 100
+  integer               :: i, j, vsum, cs, arr(n)
+
+  call redsub_private (cs, n, arr)
+  call redsub_bogus (cs, n)
+  call redsub_combined (cs, n, arr)
+
+  vsum = 0
+
+  ! Verify the results
+  do i = 1, n
+     vsum = i
+     do j = 1, n
+        vsum = vsum + 1;
+     end do
+     if (vsum .ne. arr(i)) call abort ()
+  end do
+end program reduction
+
+! This subroutine tests a reduction with an explicit private variable.
+
+subroutine redsub_private(sum, n, arr)
+  integer :: sum, n, arr(n)
+  integer :: i, j, v
+
+  !$acc parallel copyout (arr)
+  !$acc loop gang private (v)
+  do j = 1, n
+     v = j
+
+     !$acc loop vector reduction (+:v)
+     do i = 1, 100
+        v = v + 1
+     end do
+
+     arr(j) = v
+  end do
+  !$acc end parallel
+
+  ! verify the results
+  do i = 1, 10
+     if (arr(i) .ne. 100+i) call abort ()
+  end do
+end subroutine redsub_private
+
+
+! Bogus reduction on an impliclitly firstprivate variable.  The results do
+! survive the parallel region.  The goal here is to ensure that gfortran
+! doesn't ICE.
+
+subroutine redsub_bogus(sum, n)
+  integer :: sum, n, arr(n)
+  integer :: i
+
+  !$acc parallel
+  !$acc loop gang worker vector reduction (+:sum)
+  do i = 1, n
+     sum = sum + 1
+  end do
+  !$acc end parallel
+end subroutine redsub_bogus
+
+! This reduction involving a firstprivate variable yields legitimate results.
+
+subroutine redsub_combined(sum, n, arr)
+  integer :: sum, n, arr(n)
+  integer :: i, j
+
+  !$acc parallel copy (arr)
+  !$acc loop gang
+  do i = 1, n
+     sum = i;
+
+     !$acc loop reduction(+:sum)
+     do j = 1, n
+        sum = sum + 1
+     end do
+
+     arr(i) = sum
+  end do
+  !$acc end parallel
+end subroutine redsub_combined

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-02-22 15:34     ` Cesar Philippidis
@ 2016-02-22 16:23       ` Nathan Sidwell
  2016-04-06  1:54       ` Cesar Philippidis
  1 sibling, 0 replies; 19+ messages in thread
From: Nathan Sidwell @ 2016-02-22 16:23 UTC (permalink / raw)
  To: Cesar Philippidis, gcc-patches, Jakub Jelinek, Nathan Sidwell

On 02/22/16 10:34, Cesar Philippidis wrote:
> Ping. This patch still needs a review.

I'm not sure who has the approval ball.  Jakub?


nathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-02-22 15:34     ` Cesar Philippidis
  2016-02-22 16:23       ` Nathan Sidwell
@ 2016-04-06  1:54       ` Cesar Philippidis
  2016-04-06 14:23         ` Jakub Jelinek
  1 sibling, 1 reply; 19+ messages in thread
From: Cesar Philippidis @ 2016-04-06  1:54 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 1667 bytes --]

On 02/22/2016 07:34 AM, Cesar Philippidis wrote:
> Ping. This patch still needs a review.

Ping. I've attached a rebased version of this patch. The omp-low.c bits
haven't changed, but the test cases have since Thomas has been merging
some of them from trunk. I separated the omp-low.c changes because the
test cases are relatively large.

This patch fixes PR70533 and PR70535. Basically it teaches
lower_oacc_reductions how to cope with reference-type variables
(PR70533) and not to remap the reduction variables on parallel
constructs (PR70535).

Is it OK for trunk?

Cesar

> On 02/09/2016 08:17 AM, Cesar Philippidis wrote:
>> On 02/09/2016 07:33 AM, Nathan Sidwell wrote:
>>> While I've not looked at the rest of the patch, this bit stood out:
>>>
>>>> +static bool
>>>> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
>>>> +{
>>>> +  if (!is_oacc_parallel (ctx))
>>>> +    return false;
>>>> +
>>>> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
>>>> +
>>>> +  /* Don't install a local copy of the decl if it used
>>>> +     inside a acc parallel reduction.  */
>>>
>>> ^^ comment is misleading -- this routine's not installing anything
>>>
>>>> +  if (is_oacc_parallel (ctx))
>>>
>>> ^^ already checked above.
>>>
>>>> +    for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>>>> +      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>>>> +      && OMP_CLAUSE_DECL (c) == var)
>>>> +    return true;
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>
>> Thanks for catching that. Those are artifacts from when this code used
>> to be located exclusively in scan_sharing_clauses. I've updated the
>> patch with those changes.
>>
>> Cesar
>>
> 


[-- Attachment #2: pr70533-omplow.diff --]
[-- Type: text/x-patch, Size: 9744 bytes --]

2016-04-05  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* gfortran.dg/goacc/reduction-promotions.f90: Add more coverage.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Adjust test.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3fd6eb3..fa2d318 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -309,6 +309,25 @@ is_oacc_kernels (omp_context *ctx)
 	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
 }
 
+/* Return true if CTX corresponds to an oacc parallel region and if
+   VAR is used in a reduction.  */
+
+static bool
+is_oacc_parallel_reduction (tree var, omp_context *ctx)
+{
+  if (!is_oacc_parallel (ctx))
+    return false;
+
+  tree clauses = gimple_omp_target_clauses (ctx->stmt);
+
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+	&& OMP_CLAUSE_DECL (c) == var)
+      return true;
+
+  return false;
+}
+
 /* If DECL is the artificial dummy VAR_DECL created for non-static
    data member privatization, return the underlying "this" parameter,
    otherwise return NULL.  */
@@ -2122,7 +2141,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  else
 		    install_var_field (decl, true, 3, ctx,
 				       base_pointers_restrict);
-		  if (is_gimple_omp_offloaded (ctx->stmt))
+		  if (is_gimple_omp_offloaded (ctx->stmt)
+		      && !is_oacc_parallel_reduction (decl, ctx))
 		    install_var_local (decl, ctx);
 		}
 	    }
@@ -4837,7 +4857,7 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  gimplify_assign (ptr, x, ilist);
 		}
 	    }
-	  else if (is_reference (var) && !is_oacc_parallel (ctx))
+	  else if (is_reference (var))
 	    {
 	      /* For references that are being privatized for Fortran,
 		 allocate new backing storage for the new pointer
@@ -5573,7 +5593,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
-	tree incoming, outgoing;
+	tree incoming, outgoing, v1, v2, v3;
+	bool is_private = false;
 
 	enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
 	if (rcode == MINUS_EXPR)
@@ -5586,7 +5607,6 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
 	if (!var)
 	  var = orig;
-	gcc_assert (!is_reference (var));
 
 	incoming = outgoing = var;
 	
@@ -5622,22 +5642,38 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		for (; cls;  cls = OMP_CLAUSE_CHAIN (cls))
 		  if (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_REDUCTION
 		      && orig == OMP_CLAUSE_DECL (cls))
-		    goto has_outer_reduction;
+		    {
+		      incoming = outgoing = lookup_decl (orig, probe);
+		      goto has_outer_reduction;
+		    }
+		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
+			    || OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			   && orig == OMP_CLAUSE_DECL (cls))
+		    {
+		      is_private = true;
+		      goto do_lookup;
+		    }
 	      }
 
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
-		&& maybe_lookup_field (orig, outer))
+		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
 		ref_to_res = build_receiver_ref (orig, false, outer);
 		if (is_reference (orig))
 		  ref_to_res = build_simple_mem_ref (ref_to_res);
 
+		tree type = TREE_TYPE (var);
+		if (POINTER_TYPE_P (type))
+		  type = TREE_TYPE (type);
+
 		outgoing = var;
-		incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+		incoming = omp_reduction_init_op (loc, rcode, type);
 	      }
+	    else if (ctx->outer)
+	      incoming = outgoing = lookup_decl (orig, ctx->outer);
 	    else
 	      incoming = outgoing = orig;
 	      
@@ -5647,6 +5683,37 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	if (!ref_to_res)
 	  ref_to_res = integer_zero_node;
 
+        if (is_reference (orig))
+	  {
+	    tree type = TREE_TYPE (var);
+	    const char *id = IDENTIFIER_POINTER (DECL_NAME (var));
+
+	    if (!inner)
+	      {
+		tree x = create_tmp_var (TREE_TYPE (type), id);
+		gimplify_assign (var, build_fold_addr_expr (x), fork_seq);
+	      }
+
+	    v1 = create_tmp_var (type, id);
+	    v2 = create_tmp_var (type, id);
+	    v3 = create_tmp_var (type, id);
+
+	    gimplify_assign (v1, var, fork_seq);
+	    gimplify_assign (v2, var, fork_seq);
+	    gimplify_assign (v3, var, fork_seq);
+
+	    var = build_simple_mem_ref (var);
+	    v1 = build_simple_mem_ref (v1);
+	    v2 = build_simple_mem_ref (v2);
+	    v3 = build_simple_mem_ref (v3);
+	    outgoing = build_simple_mem_ref (outgoing);
+
+	    if (TREE_CODE (incoming) != INTEGER_CST)
+	      incoming = build_simple_mem_ref (incoming);
+	  }
+	else
+	  v1 = v2 = v3 = var;
+
 	/* Determine position in reduction buffer, which may be used
 	   by target.  */
 	enum machine_mode mode = TYPE_MODE (TREE_TYPE (var));
@@ -5676,20 +5743,20 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, init_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v1, level, op, off);
 	tree fini_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, fini_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v2, level, op, off);
 	tree teardown_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, teardown_code,
-					  ref_to_res, var, level, op, off);
+					  ref_to_res, v3, level, op, off);
 
-	gimplify_assign (var, setup_call, &before_fork);
-	gimplify_assign (var, init_call, &after_fork);
-	gimplify_assign (var, fini_call, &before_join);
+	gimplify_assign (v1, setup_call, &before_fork);
+	gimplify_assign (v2, init_call, &after_fork);
+	gimplify_assign (v3, fini_call, &before_join);
 	gimplify_assign (outgoing, teardown_call, &after_join);
       }
 
@@ -5931,9 +5998,6 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	}
     }
 
-  if (is_gimple_omp_oacc (ctx->stmt))
-    return;
-
   stmt = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ATOMIC_START),
 			    0);
   gimple_seq_add_stmt (stmt_seqp, stmt);
@@ -15820,7 +15884,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (!maybe_lookup_field (var, ctx))
 	  continue;
 
-	if (offloaded)
+	/* Don't remap oacc parallel reduction variables, because the
+	   intermediate result must be local to each gang.  */
+	if (offloaded && !is_oacc_parallel_reduction (var, ctx))
 	  {
 	    x = build_receiver_ref (var, true, ctx);
 	    tree new_var = lookup_decl (var, ctx);

[-- Attachment #3: pr70533-tests.diff.gz --]
[-- Type: application/gzip, Size: 9838 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-06  1:54       ` Cesar Philippidis
@ 2016-04-06 14:23         ` Jakub Jelinek
  2016-04-06 20:21           ` Cesar Philippidis
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Jelinek @ 2016-04-06 14:23 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Nathan Sidwell

On Tue, Apr 05, 2016 at 06:53:47PM -0700, Cesar Philippidis wrote:
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -309,6 +309,25 @@ is_oacc_kernels (omp_context *ctx)
>  	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
>  }
>  
> +/* Return true if CTX corresponds to an oacc parallel region and if
> +   VAR is used in a reduction.  */
> +
> +static bool
> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
> +{
> +  if (!is_oacc_parallel (ctx))
> +    return false;
> +
> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
> +
> +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> +	&& OMP_CLAUSE_DECL (c) == var)
> +      return true;
> +
> +  return false;
> +}
> +
>  /* If DECL is the artificial dummy VAR_DECL created for non-static
>     data member privatization, return the underlying "this" parameter,
>     otherwise return NULL.  */
> @@ -2122,7 +2141,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
>  		  else
>  		    install_var_field (decl, true, 3, ctx,
>  				       base_pointers_restrict);
> -		  if (is_gimple_omp_offloaded (ctx->stmt))
> +		  if (is_gimple_omp_offloaded (ctx->stmt)
> +		      && !is_oacc_parallel_reduction (decl, ctx))
>  		    install_var_local (decl, ctx);
>  		}
>  	    }

The above is O(n^2) in number of clauses on the construct.
Perhaps better define some OMP_CLAUSE_MAP_IN_REDUCTION macro (e.g.
TREE_PRIVATE bit is unused on OMP_CLAUSE_MAP right now), make sure to set it
e.g. during gimplification where you can see all GOVD_* flags for a
particular decl), and then use this flag here?


	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-06 14:23         ` Jakub Jelinek
@ 2016-04-06 20:21           ` Cesar Philippidis
  2016-04-07  9:57             ` Jakub Jelinek
  0 siblings, 1 reply; 19+ messages in thread
From: Cesar Philippidis @ 2016-04-06 20:21 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 2210 bytes --]

On 04/06/2016 07:23 AM, Jakub Jelinek wrote:
> On Tue, Apr 05, 2016 at 06:53:47PM -0700, Cesar Philippidis wrote:
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -309,6 +309,25 @@ is_oacc_kernels (omp_context *ctx)
>>  	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
>>  }
>>  
>> +/* Return true if CTX corresponds to an oacc parallel region and if
>> +   VAR is used in a reduction.  */
>> +
>> +static bool
>> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
>> +{
>> +  if (!is_oacc_parallel (ctx))
>> +    return false;
>> +
>> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
>> +
>> +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>> +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>> +	&& OMP_CLAUSE_DECL (c) == var)
>> +      return true;
>> +
>> +  return false;
>> +}
>> +
>>  /* If DECL is the artificial dummy VAR_DECL created for non-static
>>     data member privatization, return the underlying "this" parameter,
>>     otherwise return NULL.  */
>> @@ -2122,7 +2141,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
>>  		  else
>>  		    install_var_field (decl, true, 3, ctx,
>>  				       base_pointers_restrict);
>> -		  if (is_gimple_omp_offloaded (ctx->stmt))
>> +		  if (is_gimple_omp_offloaded (ctx->stmt)
>> +		      && !is_oacc_parallel_reduction (decl, ctx))
>>  		    install_var_local (decl, ctx);
>>  		}
>>  	    }
> 
> The above is O(n^2) in number of clauses on the construct.
> Perhaps better define some OMP_CLAUSE_MAP_IN_REDUCTION macro (e.g.
> TREE_PRIVATE bit is unused on OMP_CLAUSE_MAP right now), make sure to set it
> e.g. during gimplification where you can see all GOVD_* flags for a
> particular decl), and then use this flag here?

That's a good idea. I went ahead and combined this patch with the data
map reduction fix for PR70289 that I posted on Monday,
<https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00202.html>, because I'm
already scanning for parallel reduction data clauses in there. As you
suggested, I introduced an OMP_CLAUSE_MAP_IN_REDUCTION macro to the data
clauses associated with acc parallel reductions.

Is this patch OK for trunk? It fixes PR70289, PR70348, PR70373, PR70533,
PR70535 and PR70537.

Cesar

[-- Attachment #2: pr70533-20160406.diff.gz --]
[-- Type: application/gzip, Size: 14614 bytes --]

[-- Attachment #3: pr70533-70289.diff --]
[-- Type: text/x-patch, Size: 15857 bytes --]

2016-04-06  Cesar Philippidis  <cesar@codesourcery.com>

	PR lto/70289
	gcc/
	* gimplify.c (gimplify_adjust_acc_parallel_reductions): New function.
	(gimplify_omp_workshare): Call it.  Add new data clauses for acc
	parallel reductions as needed.
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.
	* gcc/tree.h (OMP_CLAUSE_MAP_IN_REDUCTION): New macro.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-5.c: New test.
	* c-c++-common/goacc/reduction-promotions.c: New test.
	* gfortran.dg/goacc/reduction-3.f95: New test.
	* gfortran.dg/goacc/reduction-promotions.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Increase
	test coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/pr70289.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/pr70373.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Increate test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70289.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Increate test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index b9757db..056e88c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9484,6 +9484,123 @@ optimize_target_teams (tree target, gimple_seq *pre_p)
   OMP_TARGET_CLAUSES (target) = c;
 }
 
+/* OpenACC parallel reductions need a present_or_copy clause to ensure
+   that the original variable used in the reduction gets updated on
+   the host.  This function scans CLAUSES for reductions and adds or
+   adjusts the data clauses as necessary.  Any incompatible data clause
+   will be reported as a warning and promoted to present_or_copy.  Any
+   private reduction will be treated as an error.  This function
+   returns a list of new present_or_copy data clauses.  */
+
+static tree
+gimplify_adjust_acc_parallel_reductions (tree *clauses)
+{
+  tree c, list = NULL_TREE;
+  hash_set<tree> *reduction_decls, *pointer_decls;
+  reduction_decls = new hash_set<tree>;
+  pointer_decls = new hash_set<tree>;
+
+  /* Scan 1: Construct a hash set with all of the reduction decls.  */
+  for (c = *clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+	{
+	  reduction_decls->add (OMP_CLAUSE_DECL (c));
+	  if (POINTER_TYPE_P (TREE_TYPE (OMP_CLAUSE_DECL (c))))
+	    pointer_decls->add (OMP_CLAUSE_DECL (c));
+	}
+    }
+
+  if (reduction_decls->elements () == 0)
+    goto cleanup;
+
+  /* Scan 2: Adjust the data clause for each reduction.  */
+  for (c = *clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      int kind = -1;
+      tree decl;
+
+      switch (OMP_CLAUSE_CODE (c))
+	{
+	case OMP_CLAUSE_MAP:
+	  kind = OMP_CLAUSE_MAP_KIND (c);
+	case OMP_CLAUSE_PRIVATE:
+	case OMP_CLAUSE_FIRSTPRIVATE:
+	  decl = OMP_CLAUSE_DECL (c);
+
+	  if (!DECL_P (decl))
+	    decl = TREE_OPERAND (decl, 0);
+	  gcc_assert (DECL_P (decl));
+
+	  /* Reference variables always have a GOMP_MAP_POINTER.  Mark
+	     that clause as IN_REDUCTION, and ignore it.  */
+	  if (POINTER_TYPE_P (TREE_TYPE (decl))
+	      && kind == GOMP_MAP_POINTER
+	      && pointer_decls->contains (decl))
+	    {
+	      OMP_CLAUSE_MAP_IN_REDUCTION (c) = 1;
+	      break;
+	    }
+
+	  if (!reduction_decls->contains (decl))
+	    break;
+
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP)
+	    {
+	      if (!pointer_decls->contains (decl))
+		OMP_CLAUSE_MAP_IN_REDUCTION(c) = 1;
+
+	      if (!((kind & GOMP_MAP_TOFROM) == GOMP_MAP_TOFROM
+		    || kind == GOMP_MAP_FORCE_PRESENT))
+		{
+		  warning_at (OMP_CLAUSE_LOCATION (c), 0, "incompatible data "
+			      "clause with reduction on %qE; promoting to "
+			      "present_or_copy", DECL_NAME (decl));
+
+		  OMP_CLAUSE_CODE (c) = OMP_CLAUSE_MAP;
+		  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
+		}
+	      reduction_decls->remove (decl);
+	      break;
+	    }
+
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE
+	      || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c), "invalid private reduction "
+			  "on %qE", DECL_NAME (decl));
+	      reduction_decls->remove (decl);
+	    }
+	default:;
+	}
+    }
+
+  if (reduction_decls->elements () == 0)
+    goto cleanup;
+  
+  /* Scan 3: Add a present_or_copy clause for any reduction variable which
+     doens't have a data clause already.  */
+  for (hash_set<tree>::iterator iter = reduction_decls->begin ();
+       iter != reduction_decls->end (); ++iter)
+    {
+      tree decl = *iter;
+
+      tree nc = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_MAP);
+      OMP_CLAUSE_SET_MAP_KIND (nc, GOMP_MAP_TOFROM);
+      OMP_CLAUSE_DECL (nc) = decl;
+      if (!POINTER_TYPE_P (TREE_TYPE (decl)))
+	OMP_CLAUSE_MAP_IN_REDUCTION (nc) = 1;
+      TREE_CHAIN (nc) = list;
+      list = nc;
+    }
+
+ cleanup:
+  delete reduction_decls;
+  delete pointer_decls;
+
+  return list;
+}
+
 /* Gimplify the gross structure of several OMP constructs.  */
 
 static void
@@ -9491,6 +9608,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
 {
   tree expr = *expr_p;
   gimple *stmt;
+  tree acc_reductions = NULL_TREE;
   gimple_seq body = NULL;
   enum omp_region_type ort;
 
@@ -9508,6 +9626,8 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
       break;
     case OACC_PARALLEL:
       ort = ORT_ACC_PARALLEL;
+      acc_reductions
+	= gimplify_adjust_acc_parallel_reductions (&OMP_CLAUSES (expr));
       break;
     case OACC_DATA:
       ort = ORT_ACC_DATA;
@@ -9606,6 +9726,48 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
 
   gimplify_seq_add_stmt (pre_p, stmt);
   *expr_p = NULL_TREE;
+
+  /* Finalize any parallel acc reductions.  */
+  if (acc_reductions)
+    {
+      tree c, nc, t;
+      tree clauses = NULL_TREE;
+
+      c = nc = acc_reductions;
+
+      while (c)
+	{
+	  nc = OMP_CLAUSE_CHAIN (c);
+	  OMP_CLAUSE_CHAIN (c) = NULL_TREE;
+	  lang_hooks.decls.omp_finish_clause (c, pre_p);
+
+	  /* Find the last data clause introduced by omp_finish_decls,
+	     marking any pointer data maps as IN_REDUCTION.  */
+	  for (t = c; t; t = TREE_CHAIN (t))
+	    {
+	      if (POINTER_TYPE_P (TREE_TYPE (OMP_CLAUSE_DECL (t))))
+		OMP_CLAUSE_MAP_IN_REDUCTION (t) = 1;
+
+	      if (TREE_CHAIN (t) == NULL_TREE)
+		break;
+	    }
+
+	  /* Update the chain of clauses.  */
+	  TREE_CHAIN (t) = clauses;
+	  clauses = c;
+
+	  c = nc;
+	}
+
+      /* Update the list of clauses in the gimple stmt.  */
+      for (t = gimple_omp_target_clauses (stmt); OMP_CLAUSE_CHAIN (t);
+	   t = OMP_CLAUSE_CHAIN (t))
+	;
+
+      OMP_CLAUSE_CHAIN (t) = clauses;
+    }
+
+  return;
 }
 
 /* Gimplify the gross structure of OpenACC enter/exit data, update, and OpenMP
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3fd6eb3..5a6f9d3 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2122,7 +2122,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  else
 		    install_var_field (decl, true, 3, ctx,
 				       base_pointers_restrict);
-		  if (is_gimple_omp_offloaded (ctx->stmt))
+		  if (is_gimple_omp_offloaded (ctx->stmt)
+		      && !OMP_CLAUSE_MAP_IN_REDUCTION (c))
 		    install_var_local (decl, ctx);
 		}
 	    }
@@ -4837,7 +4838,7 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  gimplify_assign (ptr, x, ilist);
 		}
 	    }
-	  else if (is_reference (var) && !is_oacc_parallel (ctx))
+	  else if (is_reference (var))
 	    {
 	      /* For references that are being privatized for Fortran,
 		 allocate new backing storage for the new pointer
@@ -5573,7 +5574,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
-	tree incoming, outgoing;
+	tree incoming, outgoing, v1, v2, v3;
+	bool is_private = false;
 
 	enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
 	if (rcode == MINUS_EXPR)
@@ -5586,7 +5588,6 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
 	if (!var)
 	  var = orig;
-	gcc_assert (!is_reference (var));
 
 	incoming = outgoing = var;
 	
@@ -5622,22 +5623,38 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		for (; cls;  cls = OMP_CLAUSE_CHAIN (cls))
 		  if (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_REDUCTION
 		      && orig == OMP_CLAUSE_DECL (cls))
-		    goto has_outer_reduction;
+		    {
+		      incoming = outgoing = lookup_decl (orig, probe);
+		      goto has_outer_reduction;
+		    }
+		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
+			    || OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			   && orig == OMP_CLAUSE_DECL (cls))
+		    {
+		      is_private = true;
+		      goto do_lookup;
+		    }
 	      }
 
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
-		&& maybe_lookup_field (orig, outer))
+		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
 		ref_to_res = build_receiver_ref (orig, false, outer);
 		if (is_reference (orig))
 		  ref_to_res = build_simple_mem_ref (ref_to_res);
 
+		tree type = TREE_TYPE (var);
+		if (POINTER_TYPE_P (type))
+		  type = TREE_TYPE (type);
+
 		outgoing = var;
-		incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+		incoming = omp_reduction_init_op (loc, rcode, type);
 	      }
+	    else if (ctx->outer)
+	      incoming = outgoing = lookup_decl (orig, ctx->outer);
 	    else
 	      incoming = outgoing = orig;
 	      
@@ -5647,6 +5664,37 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	if (!ref_to_res)
 	  ref_to_res = integer_zero_node;
 
+        if (is_reference (orig))
+	  {
+	    tree type = TREE_TYPE (var);
+	    const char *id = IDENTIFIER_POINTER (DECL_NAME (var));
+
+	    if (!inner)
+	      {
+		tree x = create_tmp_var (TREE_TYPE (type), id);
+		gimplify_assign (var, build_fold_addr_expr (x), fork_seq);
+	      }
+
+	    v1 = create_tmp_var (type, id);
+	    v2 = create_tmp_var (type, id);
+	    v3 = create_tmp_var (type, id);
+
+	    gimplify_assign (v1, var, fork_seq);
+	    gimplify_assign (v2, var, fork_seq);
+	    gimplify_assign (v3, var, fork_seq);
+
+	    var = build_simple_mem_ref (var);
+	    v1 = build_simple_mem_ref (v1);
+	    v2 = build_simple_mem_ref (v2);
+	    v3 = build_simple_mem_ref (v3);
+	    outgoing = build_simple_mem_ref (outgoing);
+
+	    if (TREE_CODE (incoming) != INTEGER_CST)
+	      incoming = build_simple_mem_ref (incoming);
+	  }
+	else
+	  v1 = v2 = v3 = var;
+
 	/* Determine position in reduction buffer, which may be used
 	   by target.  */
 	enum machine_mode mode = TYPE_MODE (TREE_TYPE (var));
@@ -5676,20 +5724,20 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, init_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v1, level, op, off);
 	tree fini_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, fini_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v2, level, op, off);
 	tree teardown_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, teardown_code,
-					  ref_to_res, var, level, op, off);
+					  ref_to_res, v3, level, op, off);
 
-	gimplify_assign (var, setup_call, &before_fork);
-	gimplify_assign (var, init_call, &after_fork);
-	gimplify_assign (var, fini_call, &before_join);
+	gimplify_assign (v1, setup_call, &before_fork);
+	gimplify_assign (v2, init_call, &after_fork);
+	gimplify_assign (v3, fini_call, &before_join);
 	gimplify_assign (outgoing, teardown_call, &after_join);
       }
 
@@ -5931,9 +5979,6 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	}
     }
 
-  if (is_gimple_omp_oacc (ctx->stmt))
-    return;
-
   stmt = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ATOMIC_START),
 			    0);
   gimple_seq_add_stmt (stmt_seqp, stmt);
@@ -15820,7 +15865,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (!maybe_lookup_field (var, ctx))
 	  continue;
 
-	if (offloaded)
+	/* Don't remap oacc parallel reduction variables, because the
+	   intermediate result must be local to each gang.  */
+	if (offloaded && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+			   && OMP_CLAUSE_MAP_IN_REDUCTION(c)))
 	  {
 	    x = build_receiver_ref (var, true, ctx);
 	    tree new_var = lookup_decl (var, ctx);
diff --git a/gcc/tree.h b/gcc/tree.h
index 544a6a1..945e7e4 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1532,6 +1532,9 @@ extern void protected_set_expr_location (tree, location_t);
    treatment if OMP_CLAUSE_SIZE is zero.  */
 #define OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION(NODE) \
   TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
+/* Nonzero if this map clause is for an ACC parallel reduction variable.  */
+#define OMP_CLAUSE_MAP_IN_REDUCTION(NODE) \
+  TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
 
 #define OMP_CLAUSE_PROC_BIND_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PROC_BIND)->omp_clause.subcode.proc_bind_kind)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-06 20:21           ` Cesar Philippidis
@ 2016-04-07  9:57             ` Jakub Jelinek
  2016-04-08  4:34               ` Cesar Philippidis
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Jelinek @ 2016-04-07  9:57 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Nathan Sidwell

On Wed, Apr 06, 2016 at 01:21:30PM -0700, Cesar Philippidis wrote:
> That's a good idea. I went ahead and combined this patch with the data
> map reduction fix for PR70289 that I posted on Monday,
> <https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00202.html>, because I'm
> already scanning for parallel reduction data clauses in there. As you
> suggested, I introduced an OMP_CLAUSE_MAP_IN_REDUCTION macro to the data
> clauses associated with acc parallel reductions.
> 
> Is this patch OK for trunk? It fixes PR70289, PR70348, PR70373, PR70533,
> PR70535 and PR70537.

> 2016-04-06  Cesar Philippidis  <cesar@codesourcery.com>
> 
> 	PR lto/70289

Then please use
	PR lto/70289
	PR ipa/70348
	PR tree-optimization/70373
	PR middle-end/70533
	PR middle-end/70535
PR70537 sounds like a typo to me, did you mean some other PR?

> 	gcc/
> 	* gimplify.c (gimplify_adjust_acc_parallel_reductions): New function.
...
> 	* gcc/tree.h (OMP_CLAUSE_MAP_IN_REDUCTION): New macro.

No gcc/ prefix please.

> 	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Increate test
> 	coverage.

Increase?

> 	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
> 	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
> 	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
> 	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
> 	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
> 	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
> 	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
> 	* testsuite/libgomp.oacc-fortran/pr70289.f90: New test.
> 	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Increate test

Ditto.

> +	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE
> +	      || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
> +	    {
> +	      error_at (OMP_CLAUSE_LOCATION (c), "invalid private reduction "
> +			  "on %qE", DECL_NAME (decl));

This looks wrongly formatted, "on is not below OMP.

> +  /* Scan 3: Add a present_or_copy clause for any reduction variable which
> +     doens't have a data clause already.  */

doesn't

> +  for (hash_set<tree>::iterator iter = reduction_decls->begin ();
> +       iter != reduction_decls->end (); ++iter)
> +    {
> +      tree decl = *iter;
> +
> +      tree nc = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_MAP);
> +      OMP_CLAUSE_SET_MAP_KIND (nc, GOMP_MAP_TOFROM);
> +      OMP_CLAUSE_DECL (nc) = decl;
> +      if (!POINTER_TYPE_P (TREE_TYPE (decl)))
> +	OMP_CLAUSE_MAP_IN_REDUCTION (nc) = 1;
> +      TREE_CHAIN (nc) = list;
> +      list = nc;
> +    }
> +
> + cleanup:
> +  delete reduction_decls;
> +  delete pointer_decls;
> +
> +  return list;
> +}

But more importantly, do you really have to do this separately?
I admit I haven't stepped through your testcases in the debugger, so
correct me if I'm missing something:
I mean, gimplify_scan_omp_clauses should omp_add_variable for the
OMP_CLAUSE_REDUCTION with
GOVD_REDUCTION | GOVD_SEEN | GOVD_EXPLICIT
and OMP_CLAUSE_MAP with GOVD_MAP | GOVD_EXPLICIT or so, similarly
GOVD_PRIVATE and/or GOVD_FIRSTPRIVATE flags from
OMP_CLAUSE_PRIVATE/OMP_CLAUSE_FIRSTPRIVATE.
So I believe you should have all the info you need in
(gimplify_adjust_omp_clauses) <case OMP_CLAUSE_REDUCTION>,
you have the CODE of the construct this is on (so check OACC_PARALLEL
or whatever you need), and you should be able to check if there is
explicit map/private/firstprivate clause together with OMP_CLAUSE_REDUCTION
or not, and then you can add the extra implicit clause and set
OMP_CLAUSE_MAP_IN_REDUCTION on it as appropriate.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-07  9:57             ` Jakub Jelinek
@ 2016-04-08  4:34               ` Cesar Philippidis
  2016-04-08  7:41                 ` Jakub Jelinek
  0 siblings, 1 reply; 19+ messages in thread
From: Cesar Philippidis @ 2016-04-08  4:34 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

On 04/07/2016 02:56 AM, Jakub Jelinek wrote:

> But more importantly, do you really have to do this separately?
> I admit I haven't stepped through your testcases in the debugger, so
> correct me if I'm missing something:
> I mean, gimplify_scan_omp_clauses should omp_add_variable for the
> OMP_CLAUSE_REDUCTION with
> GOVD_REDUCTION | GOVD_SEEN | GOVD_EXPLICIT
> and OMP_CLAUSE_MAP with GOVD_MAP | GOVD_EXPLICIT or so, similarly
> GOVD_PRIVATE and/or GOVD_FIRSTPRIVATE flags from
> OMP_CLAUSE_PRIVATE/OMP_CLAUSE_FIRSTPRIVATE.
> So I believe you should have all the info you need in
> (gimplify_adjust_omp_clauses) <case OMP_CLAUSE_REDUCTION>,
> you have the CODE of the construct this is on (so check OACC_PARALLEL
> or whatever you need), and you should be able to check if there is
> explicit map/private/firstprivate clause together with OMP_CLAUSE_REDUCTION
> or not, and then you can add the extra implicit clause and set
> OMP_CLAUSE_MAP_IN_REDUCTION on it as appropriate.

This new patch does that now. Is this patch OK for trunk?

Cesar

[-- Attachment #2: pr70533-20160407-gcc.diff --]
[-- Type: text/x-patch, Size: 14487 bytes --]

2016-04-07  Cesar Philippidis  <cesar@codesourcery.com>

	PR lto/70289
	PR ipa/70348
	PR tree-optimization/70373
	PR middle-end/70533
	PR middle-end/70534
	PR middle-end/70535

	gcc/
	* gimplify.c (omp_add_variable): Ignore seen reductions in
	ORT_ACC_PARALLEL regions.
	(gimplify_scan_omp_clauses): Adjust data clauses for acc parallel
	reductions as necessary.
	(gimplify_adjust_omp_clauses): Likewise, but add instead of adjust.
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.
	* tree.h (OMP_CLAUSE_MAP_IN_REDUCTION): New macro.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-5.c: New test.
	* c-c++-common/goacc/reduction-promotions.c: New test.
	* gfortran.dg/goacc/reduction-3.f95: New test.
	* gfortran.dg/goacc/reduction-promotions.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/pr70289.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/pr70373.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70289.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add test coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9c0119e..f53f5f3 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5802,7 +5802,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
     flags |= GOVD_SEEN;
 
   n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
-  if (n != NULL && (n->value & GOVD_DATA_SHARE_CLASS) != 0)
+  if (n != NULL && (n->value & GOVD_DATA_SHARE_CLASS) != 0
+      && ctx->region_type != ORT_ACC_PARALLEL)
     {
       /* We shouldn't be re-adding the decl with the same data
 	 sharing class.  */
@@ -6557,6 +6558,24 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	break;
       }
 
+/* OpenACC parallel reductions need a present_or_copy clause to ensure
+   that the original variable used in the reduction gets updated on
+   the host.  Scan the list of clauses for reduction so that any existing
+   data clause can be adjusted if necessary.  */
+  if (region_type == ORT_ACC_PARALLEL)
+    {
+      for (c = *list_p; c; c = OMP_CLAUSE_CHAIN (c))
+	{
+	  tree decl = NULL_TREE;
+
+	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+	    continue;
+
+	  decl = OMP_CLAUSE_DECL (c);
+	  omp_add_variable (ctx, decl, GOVD_REDUCTION);
+	}
+    }
+
   while ((c = *list_p) != NULL)
     {
       bool remove = false;
@@ -6808,6 +6827,31 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	case OMP_CLAUSE_MAP:
 	  decl = OMP_CLAUSE_DECL (c);
+	  if (region_type == ORT_ACC_PARALLEL)
+	    {
+	      tree t = DECL_P (decl) ? decl : TREE_OPERAND (decl, 0);
+	      splay_tree_node n = NULL;
+
+	      if (DECL_P (t))
+		n = splay_tree_lookup (ctx->variables, (splay_tree_key)t);
+
+	      if (n)
+		{
+		  int kind = OMP_CLAUSE_MAP_KIND (c);
+
+		  OMP_CLAUSE_MAP_IN_REDUCTION(c) = 1;
+		  if ((kind & GOMP_MAP_TOFROM) != GOMP_MAP_TOFROM
+		      && kind != GOMP_MAP_FORCE_PRESENT
+		      && kind != GOMP_MAP_POINTER)
+		    {
+		      warning_at (OMP_CLAUSE_LOCATION (c), 0,
+				  "incompatible data clause with reduction "
+				  "on %qE; promoting to present_or_copy",
+				  DECL_NAME (t));
+		      OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
+		    }
+		}
+	    }
 	  if (error_operand_p (decl))
 	    remove = true;
 	  switch (code)
@@ -7269,6 +7313,16 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 		    omp_notice_variable (outer_ctx, t, true);
 		}
 	    }
+	  /* Error on private reductions in OpenACC parallel directives.  */
+	  if (region_type == ORT_ACC_PARALLEL
+	      && (flags & (GOVD_PRIVATE | GOVD_FIRSTPRIVATE)))
+	    {
+	      splay_tree_node n = splay_tree_lookup (ctx->variables,
+						     (splay_tree_key)decl);
+	      if (n != NULL && (n->value & GOVD_REDUCTION))
+		error_at (OMP_CLAUSE_LOCATION (c), "invalid private reduction "
+			  "on %qE", DECL_NAME (decl));
+	    }
 	  omp_add_variable (ctx, decl, flags);
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
 	      && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
@@ -8118,6 +8172,29 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 
 	case OMP_CLAUSE_REDUCTION:
 	  decl = OMP_CLAUSE_DECL (c);
+	  /* OpenACC reductions need a present_or_copy data clause.
+	     Add one if necessary.  */
+	  if (ctx->region_type == ORT_ACC_PARALLEL)
+	    {
+	      n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
+	      if ((n->value & (GOVD_MAP | GOVD_PRIVATE | GOVD_FIRSTPRIVATE))
+		  == 0)
+		{
+		  tree next = OMP_CLAUSE_CHAIN (c);
+		  tree nc = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_MAP);
+		  OMP_CLAUSE_SET_MAP_KIND (nc, GOMP_MAP_TOFROM);
+		  OMP_CLAUSE_DECL (nc) = decl;
+		  OMP_CLAUSE_CHAIN (c) = nc;
+		  lang_hooks.decls.omp_finish_clause (nc, pre_p);
+		  for (; nc; nc = OMP_CLAUSE_CHAIN (nc))
+		    {
+		      OMP_CLAUSE_MAP_IN_REDUCTION (nc) = 1;
+		      if (OMP_CLAUSE_CHAIN (nc) == NULL)
+			break;
+		    }
+		  OMP_CLAUSE_CHAIN (nc) = next;
+		}
+	    }
 	  if (DECL_P (decl)
 	      && omp_shared_to_firstprivate_optimizable_decl_p (decl))
 	    omp_mark_stores (gimplify_omp_ctxp->outer_context, decl);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 979926d..ed47853 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2122,7 +2122,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  else
 		    install_var_field (decl, true, 3, ctx,
 				       base_pointers_restrict);
-		  if (is_gimple_omp_offloaded (ctx->stmt))
+		  if (is_gimple_omp_offloaded (ctx->stmt)
+		      && !OMP_CLAUSE_MAP_IN_REDUCTION (c))
 		    install_var_local (decl, ctx);
 		}
 	    }
@@ -4839,7 +4840,7 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  gimplify_assign (ptr, x, ilist);
 		}
 	    }
-	  else if (is_reference (var) && !is_oacc_parallel (ctx))
+	  else if (is_reference (var))
 	    {
 	      /* For references that are being privatized for Fortran,
 		 allocate new backing storage for the new pointer
@@ -5575,7 +5576,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
-	tree incoming, outgoing;
+	tree incoming, outgoing, v1, v2, v3;
+	bool is_private = false;
 
 	enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
 	if (rcode == MINUS_EXPR)
@@ -5588,7 +5590,6 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
 	if (!var)
 	  var = orig;
-	gcc_assert (!is_reference (var));
 
 	incoming = outgoing = var;
 	
@@ -5624,22 +5625,38 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		for (; cls;  cls = OMP_CLAUSE_CHAIN (cls))
 		  if (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_REDUCTION
 		      && orig == OMP_CLAUSE_DECL (cls))
-		    goto has_outer_reduction;
+		    {
+		      incoming = outgoing = lookup_decl (orig, probe);
+		      goto has_outer_reduction;
+		    }
+		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
+			    || OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			   && orig == OMP_CLAUSE_DECL (cls))
+		    {
+		      is_private = true;
+		      goto do_lookup;
+		    }
 	      }
 
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
-		&& maybe_lookup_field (orig, outer))
+		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
 		ref_to_res = build_receiver_ref (orig, false, outer);
 		if (is_reference (orig))
 		  ref_to_res = build_simple_mem_ref (ref_to_res);
 
+		tree type = TREE_TYPE (var);
+		if (POINTER_TYPE_P (type))
+		  type = TREE_TYPE (type);
+
 		outgoing = var;
-		incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+		incoming = omp_reduction_init_op (loc, rcode, type);
 	      }
+	    else if (ctx->outer)
+	      incoming = outgoing = lookup_decl (orig, ctx->outer);
 	    else
 	      incoming = outgoing = orig;
 	      
@@ -5649,6 +5666,37 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	if (!ref_to_res)
 	  ref_to_res = integer_zero_node;
 
+        if (is_reference (orig))
+	  {
+	    tree type = TREE_TYPE (var);
+	    const char *id = IDENTIFIER_POINTER (DECL_NAME (var));
+
+	    if (!inner)
+	      {
+		tree x = create_tmp_var (TREE_TYPE (type), id);
+		gimplify_assign (var, build_fold_addr_expr (x), fork_seq);
+	      }
+
+	    v1 = create_tmp_var (type, id);
+	    v2 = create_tmp_var (type, id);
+	    v3 = create_tmp_var (type, id);
+
+	    gimplify_assign (v1, var, fork_seq);
+	    gimplify_assign (v2, var, fork_seq);
+	    gimplify_assign (v3, var, fork_seq);
+
+	    var = build_simple_mem_ref (var);
+	    v1 = build_simple_mem_ref (v1);
+	    v2 = build_simple_mem_ref (v2);
+	    v3 = build_simple_mem_ref (v3);
+	    outgoing = build_simple_mem_ref (outgoing);
+
+	    if (TREE_CODE (incoming) != INTEGER_CST)
+	      incoming = build_simple_mem_ref (incoming);
+	  }
+	else
+	  v1 = v2 = v3 = var;
+
 	/* Determine position in reduction buffer, which may be used
 	   by target.  */
 	enum machine_mode mode = TYPE_MODE (TREE_TYPE (var));
@@ -5678,20 +5726,20 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, init_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v1, level, op, off);
 	tree fini_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, fini_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v2, level, op, off);
 	tree teardown_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, teardown_code,
-					  ref_to_res, var, level, op, off);
+					  ref_to_res, v3, level, op, off);
 
-	gimplify_assign (var, setup_call, &before_fork);
-	gimplify_assign (var, init_call, &after_fork);
-	gimplify_assign (var, fini_call, &before_join);
+	gimplify_assign (v1, setup_call, &before_fork);
+	gimplify_assign (v2, init_call, &after_fork);
+	gimplify_assign (v3, fini_call, &before_join);
 	gimplify_assign (outgoing, teardown_call, &after_join);
       }
 
@@ -5933,9 +5981,6 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	}
     }
 
-  if (is_gimple_omp_oacc (ctx->stmt))
-    return;
-
   stmt = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ATOMIC_START),
 			    0);
   gimple_seq_add_stmt (stmt_seqp, stmt);
@@ -15829,7 +15874,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (!maybe_lookup_field (var, ctx))
 	  continue;
 
-	if (offloaded)
+	/* Don't remap oacc parallel reduction variables, because the
+	   intermediate result must be local to each gang.  */
+	if (offloaded && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+			   && OMP_CLAUSE_MAP_IN_REDUCTION(c)))
 	  {
 	    x = build_receiver_ref (var, true, ctx);
 	    tree new_var = lookup_decl (var, ctx);
diff --git a/gcc/tree.h b/gcc/tree.h
index fa70596..87e7563 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1536,6 +1536,9 @@ extern void protected_set_expr_location (tree, location_t);
    treatment if OMP_CLAUSE_SIZE is zero.  */
 #define OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION(NODE) \
   TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
+/* Nonzero if this map clause is for an ACC parallel reduction variable.  */
+#define OMP_CLAUSE_MAP_IN_REDUCTION(NODE) \
+  TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
 
 #define OMP_CLAUSE_PROC_BIND_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PROC_BIND)->omp_clause.subcode.proc_bind_kind)

[-- Attachment #3: pr70533-20160407-full.diff.gz --]
[-- Type: application/gzip, Size: 14321 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-08  4:34               ` Cesar Philippidis
@ 2016-04-08  7:41                 ` Jakub Jelinek
  2016-04-08  7:44                   ` Jakub Jelinek
                                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jakub Jelinek @ 2016-04-08  7:41 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Nathan Sidwell

On Thu, Apr 07, 2016 at 09:34:43PM -0700, Cesar Philippidis wrote:
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -5802,7 +5802,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
>      flags |= GOVD_SEEN;
>  
>    n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
> -  if (n != NULL && (n->value & GOVD_DATA_SHARE_CLASS) != 0)
> +  if (n != NULL && (n->value & GOVD_DATA_SHARE_CLASS) != 0
> +      && ctx->region_type != ORT_ACC_PARALLEL)
>      {
>        /* We shouldn't be re-adding the decl with the same data
>  	 sharing class.  */

Why?

> @@ -6557,6 +6558,24 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>  	break;
>        }
>  
> +/* OpenACC parallel reductions need a present_or_copy clause to ensure
> +   that the original variable used in the reduction gets updated on
> +   the host.  Scan the list of clauses for reduction so that any existing
> +   data clause can be adjusted if necessary.  */
> +  if (region_type == ORT_ACC_PARALLEL)
> +    {
> +      for (c = *list_p; c; c = OMP_CLAUSE_CHAIN (c))
> +	{
> +	  tree decl = NULL_TREE;
> +
> +	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
> +	    continue;
> +
> +	  decl = OMP_CLAUSE_DECL (c);
> +	  omp_add_variable (ctx, decl, GOVD_REDUCTION);
> +	}
> +    }
> +

And this looks also wrong, why?
If I try under the debugger 3 cases:
void f1 (int sum)
{
  #pragma acc parallel reduction(+:sum) present_or_copy(sum)
    ;
}
void f2 (int sum)
{
  #pragma acc parallel present_or_copy(sum)
    ;
}
void f3 (int sum)
{
  #pragma acc parallel reduction(+:sum)
    ;
}
then I see the loop that starts with the while below doing the right thing
already.  In the first case you end up with
GOVD_SEEN | GOVD_EXPLICIT | GOVD_REDUCTION | GOVD_MAP
in the second with
GOVD_SEEN | GOVD_EXPLICIT | GOVD_MAP
and third one with
GOVD_SEEN | GOVD_EXPLICIT | GOVD_REDUCTION

That is where you IMHO should stop at the gimplify_scan_omp_clauses side,
so don't modify neither omp_add_variable nor gimplify_scan_omp_clauses
at all, and do everything else in gimplify_adjust_omp_clauses.
That function walks the explicit clauses and has all the info gathered
during gimplify_scan_omp_clauses available in the splay tree.
So, you can do all the checking there.  Say on OMP_CLAUSE_REDUCTION
for the ORT_ACC_PARALLEL check the flags if they include GOVD_PRIVATE
or GOVD_FIRSTPRIVATE, if yes, complain.  Also check if GOVD_MAP is included,
if not, add the extra OMP_CLAUSE_MAP tofrom.
And, on OMP_CLAUSE_MAP, check if GOVD_REDUCTION is set on ORT_ACC_PARALLEL,
and if yes, check if it is tofrom and complain otherwise.

>    while ((c = *list_p) != NULL)
>      {
>        bool remove = false;
> @@ -6808,6 +6827,31 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>  
>  	case OMP_CLAUSE_MAP:
>  	  decl = OMP_CLAUSE_DECL (c);
> +	  if (region_type == ORT_ACC_PARALLEL)
> +	    {
> +	      tree t = DECL_P (decl) ? decl : TREE_OPERAND (decl, 0);
> +	      splay_tree_node n = NULL;
> +
> +	      if (DECL_P (t))
> +		n = splay_tree_lookup (ctx->variables, (splay_tree_key)t);
> +
> +	      if (n)
> +		{
> +		  int kind = OMP_CLAUSE_MAP_KIND (c);
> +
> +		  OMP_CLAUSE_MAP_IN_REDUCTION(c) = 1;
> +		  if ((kind & GOMP_MAP_TOFROM) != GOMP_MAP_TOFROM
> +		      && kind != GOMP_MAP_FORCE_PRESENT
> +		      && kind != GOMP_MAP_POINTER)
> +		    {
> +		      warning_at (OMP_CLAUSE_LOCATION (c), 0,
> +				  "incompatible data clause with reduction "
> +				  "on %qE; promoting to present_or_copy",
> +				  DECL_NAME (t));
> +		      OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
> +		    }
> +		}
> +	    }
>  	  if (error_operand_p (decl))
>  	    remove = true;
>  	  switch (code)

So the above is also wrong IMNSHO.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-08  7:41                 ` Jakub Jelinek
@ 2016-04-08  7:44                   ` Jakub Jelinek
  2016-04-08 14:14                   ` Nathan Sidwell
  2016-04-08 14:35                   ` Cesar Philippidis
  2 siblings, 0 replies; 19+ messages in thread
From: Jakub Jelinek @ 2016-04-08  7:44 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Nathan Sidwell

On Fri, Apr 08, 2016 at 09:40:54AM +0200, Jakub Jelinek wrote:
> So, you can do all the checking there.  Say on OMP_CLAUSE_REDUCTION
> for the ORT_ACC_PARALLEL check the flags if they include GOVD_PRIVATE
> or GOVD_FIRSTPRIVATE, if yes, complain.  Also check if GOVD_MAP is included,

Though,
void f1 (int sum)
{
  #pragma acc parallel reduction(+:sum) firstprivate(sum)
    ;
}
void f2 (int sum)
{
  #pragma acc parallel reduction(+:sum) private(sum)
    ;
}
is already rejected in the FE, so not sure why you want to deal with that.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-08  7:41                 ` Jakub Jelinek
  2016-04-08  7:44                   ` Jakub Jelinek
@ 2016-04-08 14:14                   ` Nathan Sidwell
  2016-04-08 14:21                     ` Jakub Jelinek
  2016-04-08 14:46                     ` Cesar Philippidis
  2016-04-08 14:35                   ` Cesar Philippidis
  2 siblings, 2 replies; 19+ messages in thread
From: Nathan Sidwell @ 2016-04-08 14:14 UTC (permalink / raw)
  To: Jakub Jelinek, Cesar Philippidis; +Cc: gcc-patches

On 04/08/16 00:40, Jakub Jelinek wrote:

>> +/* OpenACC parallel reductions need a present_or_copy clause to ensure
>> +   that the original variable used in the reduction gets updated on
>> +   the host.  Scan the list of clauses for reduction so that any existing
>> +   data clause can be adjusted if necessary.  */
>> +  if (region_type == ORT_ACC_PARALLEL)
>> +    {
>> +      for (c = *list_p; c; c = OMP_CLAUSE_CHAIN (c))
>> +	{
>> +	  tree decl = NULL_TREE;
>> +
>> +	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
>> +	    continue;
>> +
>> +	  decl = OMP_CLAUSE_DECL (c);
>> +	  omp_add_variable (ctx, decl, GOVD_REDUCTION);
>> +	}
>> +    }
>> +
>
> And this looks also wrong, why?

And besides, won't it break
#pragma acc parallel firstprivate (x)  reduction(+:x)
{...}

which the openacc guys at GTC indicated they didn't want to disallow (without 
further thinking).

It seems to me the interaction of reduction and data copying at a parallel is 
still  not sufficiently well defined.

nathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-08 14:14                   ` Nathan Sidwell
@ 2016-04-08 14:21                     ` Jakub Jelinek
  2016-04-08 14:46                     ` Cesar Philippidis
  1 sibling, 0 replies; 19+ messages in thread
From: Jakub Jelinek @ 2016-04-08 14:21 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Cesar Philippidis, gcc-patches

On Fri, Apr 08, 2016 at 07:14:13AM -0700, Nathan Sidwell wrote:
> On 04/08/16 00:40, Jakub Jelinek wrote:
> 
> >>+/* OpenACC parallel reductions need a present_or_copy clause to ensure
> >>+   that the original variable used in the reduction gets updated on
> >>+   the host.  Scan the list of clauses for reduction so that any existing
> >>+   data clause can be adjusted if necessary.  */
> >>+  if (region_type == ORT_ACC_PARALLEL)
> >>+    {
> >>+      for (c = *list_p; c; c = OMP_CLAUSE_CHAIN (c))
> >>+	{
> >>+	  tree decl = NULL_TREE;
> >>+
> >>+	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
> >>+	    continue;
> >>+
> >>+	  decl = OMP_CLAUSE_DECL (c);
> >>+	  omp_add_variable (ctx, decl, GOVD_REDUCTION);
> >>+	}
> >>+    }
> >>+
> >
> >And this looks also wrong, why?
> 
> And besides, won't it break
> #pragma acc parallel firstprivate (x)  reduction(+:x)
> {...}
> 
> which the openacc guys at GTC indicated they didn't want to disallow
> (without further thinking).

Well, given that this patchset is supposed to fix one of the last few P1
blockers, I think we shouldn't be changing the FEs now to allow the above,
just get working what is already allowed, and defer the rest for GCC 7.

Yes, acc reductions are just very weird and to me they look insufficiently
well defined.  In OpenMP, reduction is a data sharing clause like
private/firstprivate etc., so one can use either private, or firstprivate,
or reduction, but not multiple of them on the same decl; the exception is
firstprivate and lastprivate on the same decl is allowed.  Furthermore,
reduction is not allowed at all on target construct, it can be only on
teams, parallel, for, simd, sections constructs, so the data sharing or
mapping on target is orthogonal on if you reduce on the inner constructs or
not.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-08  7:41                 ` Jakub Jelinek
  2016-04-08  7:44                   ` Jakub Jelinek
  2016-04-08 14:14                   ` Nathan Sidwell
@ 2016-04-08 14:35                   ` Cesar Philippidis
  2016-04-08 15:30                     ` Jakub Jelinek
  2 siblings, 1 reply; 19+ messages in thread
From: Cesar Philippidis @ 2016-04-08 14:35 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 3801 bytes --]

On 04/08/2016 12:40 AM, Jakub Jelinek wrote:
> On Thu, Apr 07, 2016 at 09:34:43PM -0700, Cesar Philippidis wrote:
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -5802,7 +5802,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
>>      flags |= GOVD_SEEN;
>>  
>>    n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
>> -  if (n != NULL && (n->value & GOVD_DATA_SHARE_CLASS) != 0)
>> +  if (n != NULL && (n->value & GOVD_DATA_SHARE_CLASS) != 0
>> +      && ctx->region_type != ORT_ACC_PARALLEL)
>>      {
>>        /* We shouldn't be re-adding the decl with the same data
>>  	 sharing class.  */
> 
> Why?

Because I was trying be clever and do everything in
gimplify_scan_omp_clauses initially. I removed this in the attached patch.

>> @@ -6557,6 +6558,24 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>>  	break;
>>        }
>>  
>> +/* OpenACC parallel reductions need a present_or_copy clause to ensure
>> +   that the original variable used in the reduction gets updated on
>> +   the host.  Scan the list of clauses for reduction so that any existing
>> +   data clause can be adjusted if necessary.  */
>> +  if (region_type == ORT_ACC_PARALLEL)
>> +    {
>> +      for (c = *list_p; c; c = OMP_CLAUSE_CHAIN (c))
>> +	{
>> +	  tree decl = NULL_TREE;
>> +
>> +	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
>> +	    continue;
>> +
>> +	  decl = OMP_CLAUSE_DECL (c);
>> +	  omp_add_variable (ctx, decl, GOVD_REDUCTION);
>> +	}
>> +    }
>> +
> 
> And this looks also wrong, why?
> If I try under the debugger 3 cases:
> void f1 (int sum)
> {
>   #pragma acc parallel reduction(+:sum) present_or_copy(sum)
>     ;
> }
> void f2 (int sum)
> {
>   #pragma acc parallel present_or_copy(sum)
>     ;
> }
> void f3 (int sum)
> {
>   #pragma acc parallel reduction(+:sum)
>     ;
> }
> then I see the loop that starts with the while below doing the right thing
> already.  In the first case you end up with
> GOVD_SEEN | GOVD_EXPLICIT | GOVD_REDUCTION | GOVD_MAP
> in the second with
> GOVD_SEEN | GOVD_EXPLICIT | GOVD_MAP
> and third one with
> GOVD_SEEN | GOVD_EXPLICIT | GOVD_REDUCTION
> 
> That is where you IMHO should stop at the gimplify_scan_omp_clauses side,
> so don't modify neither omp_add_variable nor gimplify_scan_omp_clauses
> at all, and do everything else in gimplify_adjust_omp_clauses.
> That function walks the explicit clauses and has all the info gathered
> during gimplify_scan_omp_clauses available in the splay tree.
> So, you can do all the checking there.  Say on OMP_CLAUSE_REDUCTION
> for the ORT_ACC_PARALLEL check the flags if they include GOVD_PRIVATE
> or GOVD_FIRSTPRIVATE, if yes, complain.  Also check if GOVD_MAP is included,
> if not, add the extra OMP_CLAUSE_MAP tofrom.
> And, on OMP_CLAUSE_MAP, check if GOVD_REDUCTION is set on ORT_ACC_PARALLEL,
> and if yes, check if it is tofrom and complain otherwise.

Yeah, that does simplify things quite a bit. This patch still needs to
finish testing. Is it OK for trunk if the test results comes back clean?

On 04/08/2016 12:43 AM, Jakub Jelinek wrote:
> On Fri, Apr 08, 2016 at 09:40:54AM +0200, Jakub Jelinek wrote:
>> So, you can do all the checking there.  Say on OMP_CLAUSE_REDUCTION
>> for the ORT_ACC_PARALLEL check the flags if they include GOVD_PRIVATE
>> or GOVD_FIRSTPRIVATE, if yes, complain.  Also check if GOVD_MAP is
included,
>
> Though,
> void f1 (int sum)
> {
>   #pragma acc parallel reduction(+:sum) firstprivate(sum)
>     ;
> }
> void f2 (int sum)
> {
>   #pragma acc parallel reduction(+:sum) private(sum)
>     ;
> }
> is already rejected in the FE, so not sure why you want to deal with that.

The FEs a little inconsistent, and I didn't want to make this patch that
invasive. Can the FE changes wait to gcc7?

Cesar

[-- Attachment #2: pr70533-20160408-gcc.diff --]
[-- Type: text/x-patch, Size: 9991 bytes --]

2016-04-08  Cesar Philippidis  <cesar@codesourcery.com>

	PR lto/70289
	PR ipa/70348
	PR tree-optimization/70373
	PR middle-end/70533
	PR middle-end/70534
	PR middle-end/70535

	* gimplify.c (gimplify_adjust_omp_clauses): Add or adjust data
	clauses for acc parallel reductions as necessary.  Error on those
	that are private.
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.
	* tree.h (OMP_CLAUSE_MAP_IN_REDUCTION): New macro.

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9c0119e..e376cde 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7987,6 +7987,34 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	      break;
 	    }
 	  decl = OMP_CLAUSE_DECL (c);
+	  /* Data clasues associated with acc parallel reductions must be
+	     compatible with present_or_copy.  Warn and adjust the clause
+	     if that is not the case.  */
+	  if (ctx->region_type == ORT_ACC_PARALLEL)
+	    {
+	      tree t = DECL_P (decl) ? decl : TREE_OPERAND (decl, 0);
+	      n = NULL;
+
+	      if (DECL_P (t))
+		n = splay_tree_lookup (ctx->variables, (splay_tree_key)t);
+
+	      if (n && (n->value & GOVD_REDUCTION))
+		{
+		  int kind = OMP_CLAUSE_MAP_KIND (c);
+
+		  OMP_CLAUSE_MAP_IN_REDUCTION(c) = 1;
+		  if ((kind & GOMP_MAP_TOFROM) != GOMP_MAP_TOFROM
+		      && kind != GOMP_MAP_FORCE_PRESENT
+		      && kind != GOMP_MAP_POINTER)
+		    {
+		      warning_at (OMP_CLAUSE_LOCATION (c), 0,
+				  "incompatible data clause with reduction "
+				  "on %qE; promoting to present_or_copy",
+				  DECL_NAME (t));
+		      OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
+		    }
+		}
+	    }
 	  if (!DECL_P (decl))
 	    {
 	      if ((ctx->region_type & ORT_TARGET) != 0
@@ -8118,6 +8146,34 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 
 	case OMP_CLAUSE_REDUCTION:
 	  decl = OMP_CLAUSE_DECL (c);
+	  /* OpenACC reductions need a present_or_copy data clause.
+	     Add one if necessary.  Error is the reduction is private.  */
+	  if (ctx->region_type == ORT_ACC_PARALLEL)
+	    {
+	      n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
+	      if (n->value & (GOVD_PRIVATE | GOVD_FIRSTPRIVATE))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c), "invalid private "
+			    "reduction on %qE", DECL_NAME (decl));
+		}
+	      else if ((n->value & GOVD_MAP) == 0)
+		{
+		  tree next = OMP_CLAUSE_CHAIN (c);
+		  tree nc = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_MAP);
+		  OMP_CLAUSE_SET_MAP_KIND (nc, GOMP_MAP_TOFROM);
+		  OMP_CLAUSE_DECL (nc) = decl;
+		  OMP_CLAUSE_CHAIN (c) = nc;
+		  lang_hooks.decls.omp_finish_clause (nc, pre_p);
+		  for (; nc; nc = OMP_CLAUSE_CHAIN (nc))
+		    {
+		      OMP_CLAUSE_MAP_IN_REDUCTION (nc) = 1;
+		      if (OMP_CLAUSE_CHAIN (nc) == NULL)
+			break;
+		    }
+		  OMP_CLAUSE_CHAIN (nc) = next;
+		  n->value |= GOVD_MAP;
+		}
+	    }
 	  if (DECL_P (decl)
 	      && omp_shared_to_firstprivate_optimizable_decl_p (decl))
 	    omp_mark_stores (gimplify_omp_ctxp->outer_context, decl);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 979926d..ed47853 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2122,7 +2122,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  else
 		    install_var_field (decl, true, 3, ctx,
 				       base_pointers_restrict);
-		  if (is_gimple_omp_offloaded (ctx->stmt))
+		  if (is_gimple_omp_offloaded (ctx->stmt)
+		      && !OMP_CLAUSE_MAP_IN_REDUCTION (c))
 		    install_var_local (decl, ctx);
 		}
 	    }
@@ -4839,7 +4840,7 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  gimplify_assign (ptr, x, ilist);
 		}
 	    }
-	  else if (is_reference (var) && !is_oacc_parallel (ctx))
+	  else if (is_reference (var))
 	    {
 	      /* For references that are being privatized for Fortran,
 		 allocate new backing storage for the new pointer
@@ -5575,7 +5576,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
-	tree incoming, outgoing;
+	tree incoming, outgoing, v1, v2, v3;
+	bool is_private = false;
 
 	enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
 	if (rcode == MINUS_EXPR)
@@ -5588,7 +5590,6 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
 	if (!var)
 	  var = orig;
-	gcc_assert (!is_reference (var));
 
 	incoming = outgoing = var;
 	
@@ -5624,22 +5625,38 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		for (; cls;  cls = OMP_CLAUSE_CHAIN (cls))
 		  if (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_REDUCTION
 		      && orig == OMP_CLAUSE_DECL (cls))
-		    goto has_outer_reduction;
+		    {
+		      incoming = outgoing = lookup_decl (orig, probe);
+		      goto has_outer_reduction;
+		    }
+		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
+			    || OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			   && orig == OMP_CLAUSE_DECL (cls))
+		    {
+		      is_private = true;
+		      goto do_lookup;
+		    }
 	      }
 
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
-		&& maybe_lookup_field (orig, outer))
+		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
 		ref_to_res = build_receiver_ref (orig, false, outer);
 		if (is_reference (orig))
 		  ref_to_res = build_simple_mem_ref (ref_to_res);
 
+		tree type = TREE_TYPE (var);
+		if (POINTER_TYPE_P (type))
+		  type = TREE_TYPE (type);
+
 		outgoing = var;
-		incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+		incoming = omp_reduction_init_op (loc, rcode, type);
 	      }
+	    else if (ctx->outer)
+	      incoming = outgoing = lookup_decl (orig, ctx->outer);
 	    else
 	      incoming = outgoing = orig;
 	      
@@ -5649,6 +5666,37 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	if (!ref_to_res)
 	  ref_to_res = integer_zero_node;
 
+        if (is_reference (orig))
+	  {
+	    tree type = TREE_TYPE (var);
+	    const char *id = IDENTIFIER_POINTER (DECL_NAME (var));
+
+	    if (!inner)
+	      {
+		tree x = create_tmp_var (TREE_TYPE (type), id);
+		gimplify_assign (var, build_fold_addr_expr (x), fork_seq);
+	      }
+
+	    v1 = create_tmp_var (type, id);
+	    v2 = create_tmp_var (type, id);
+	    v3 = create_tmp_var (type, id);
+
+	    gimplify_assign (v1, var, fork_seq);
+	    gimplify_assign (v2, var, fork_seq);
+	    gimplify_assign (v3, var, fork_seq);
+
+	    var = build_simple_mem_ref (var);
+	    v1 = build_simple_mem_ref (v1);
+	    v2 = build_simple_mem_ref (v2);
+	    v3 = build_simple_mem_ref (v3);
+	    outgoing = build_simple_mem_ref (outgoing);
+
+	    if (TREE_CODE (incoming) != INTEGER_CST)
+	      incoming = build_simple_mem_ref (incoming);
+	  }
+	else
+	  v1 = v2 = v3 = var;
+
 	/* Determine position in reduction buffer, which may be used
 	   by target.  */
 	enum machine_mode mode = TYPE_MODE (TREE_TYPE (var));
@@ -5678,20 +5726,20 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, init_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v1, level, op, off);
 	tree fini_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, fini_code,
 					  unshare_expr (ref_to_res),
-					  var, level, op, off);
+					  v2, level, op, off);
 	tree teardown_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 					  TREE_TYPE (var), 6, teardown_code,
-					  ref_to_res, var, level, op, off);
+					  ref_to_res, v3, level, op, off);
 
-	gimplify_assign (var, setup_call, &before_fork);
-	gimplify_assign (var, init_call, &after_fork);
-	gimplify_assign (var, fini_call, &before_join);
+	gimplify_assign (v1, setup_call, &before_fork);
+	gimplify_assign (v2, init_call, &after_fork);
+	gimplify_assign (v3, fini_call, &before_join);
 	gimplify_assign (outgoing, teardown_call, &after_join);
       }
 
@@ -5933,9 +5981,6 @@ lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
 	}
     }
 
-  if (is_gimple_omp_oacc (ctx->stmt))
-    return;
-
   stmt = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ATOMIC_START),
 			    0);
   gimple_seq_add_stmt (stmt_seqp, stmt);
@@ -15829,7 +15874,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	if (!maybe_lookup_field (var, ctx))
 	  continue;
 
-	if (offloaded)
+	/* Don't remap oacc parallel reduction variables, because the
+	   intermediate result must be local to each gang.  */
+	if (offloaded && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+			   && OMP_CLAUSE_MAP_IN_REDUCTION(c)))
 	  {
 	    x = build_receiver_ref (var, true, ctx);
 	    tree new_var = lookup_decl (var, ctx);
diff --git a/gcc/tree.h b/gcc/tree.h
index fa70596..87e7563 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1536,6 +1536,9 @@ extern void protected_set_expr_location (tree, location_t);
    treatment if OMP_CLAUSE_SIZE is zero.  */
 #define OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION(NODE) \
   TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
+/* Nonzero if this map clause is for an ACC parallel reduction variable.  */
+#define OMP_CLAUSE_MAP_IN_REDUCTION(NODE) \
+  TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
 
 #define OMP_CLAUSE_PROC_BIND_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PROC_BIND)->omp_clause.subcode.proc_bind_kind)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-08 14:14                   ` Nathan Sidwell
  2016-04-08 14:21                     ` Jakub Jelinek
@ 2016-04-08 14:46                     ` Cesar Philippidis
  2016-04-08 14:49                       ` Nathan Sidwell
  1 sibling, 1 reply; 19+ messages in thread
From: Cesar Philippidis @ 2016-04-08 14:46 UTC (permalink / raw)
  To: Nathan Sidwell, Jakub Jelinek; +Cc: gcc-patches

On 04/08/2016 07:14 AM, Nathan Sidwell wrote:
> On 04/08/16 00:40, Jakub Jelinek wrote:
> 
>>> +/* OpenACC parallel reductions need a present_or_copy clause to ensure
>>> +   that the original variable used in the reduction gets updated on
>>> +   the host.  Scan the list of clauses for reduction so that any
>>> existing
>>> +   data clause can be adjusted if necessary.  */
>>> +  if (region_type == ORT_ACC_PARALLEL)
>>> +    {
>>> +      for (c = *list_p; c; c = OMP_CLAUSE_CHAIN (c))
>>> +    {
>>> +      tree decl = NULL_TREE;
>>> +
>>> +      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
>>> +        continue;
>>> +
>>> +      decl = OMP_CLAUSE_DECL (c);
>>> +      omp_add_variable (ctx, decl, GOVD_REDUCTION);
>>> +    }
>>> +    }
>>> +
>>
>> And this looks also wrong, why?
> 
> And besides, won't it break
> #pragma acc parallel firstprivate (x)  reduction(+:x)
> {...}

This would become an error. But we'd still allow stuff like

 #pragma acc parallel firstprivate(x)
 #pragma acc loop reduction(+:x)

> which the openacc guys at GTC indicated they didn't want to disallow
> (without further thinking).

I'm not sure what the purpose of a firstprivate parallel reduction is
anyway. By definition, the final result of a parallel reduction isn't
available until the end of the parallel region. And if the value is
firstprivate, the final result can never be utilized. That seems like a
candidate for an error to me.

> It seems to me the interaction of reduction and data copying at a
> parallel is still  not sufficiently well defined.

Tom and I were pushing to make parallel reductions imply copy (or
present_or_copy in openacc 2.0). Did that fall through?

Cesar

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-08 14:46                     ` Cesar Philippidis
@ 2016-04-08 14:49                       ` Nathan Sidwell
  0 siblings, 0 replies; 19+ messages in thread
From: Nathan Sidwell @ 2016-04-08 14:49 UTC (permalink / raw)
  To: Cesar Philippidis, Jakub Jelinek; +Cc: gcc-patches

On 04/08/16 07:46, Cesar Philippidis wrote:

> I'm not sure what the purpose of a firstprivate parallel reduction is
> anyway.

I'm not sure either.


> Tom and I were pushing to make parallel reductions imply copy (or
> present_or_copy in openacc 2.0). Did that fall through?

It doesn't seem to have reached conclusion.  One question is does it force POC 
or does it default POC (which could be overridden).

nathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: openacc reference reductions
  2016-04-08 14:35                   ` Cesar Philippidis
@ 2016-04-08 15:30                     ` Jakub Jelinek
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Jelinek @ 2016-04-08 15:30 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Nathan Sidwell

On Fri, Apr 08, 2016 at 07:35:35AM -0700, Cesar Philippidis wrote:
> The FEs a little inconsistent, and I didn't want to make this patch that
> invasive. Can the FE changes wait to gcc7?

Sure.

> 2016-04-08  Cesar Philippidis  <cesar@codesourcery.com>
> 
> 	PR lto/70289
> 	PR ipa/70348
> 	PR tree-optimization/70373
> 	PR middle-end/70533
> 	PR middle-end/70534
> 	PR middle-end/70535
> 

No empty line between PR lines and * gimplify.c (... line.
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -7987,6 +7987,34 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
>  	      break;
>  	    }
>  	  decl = OMP_CLAUSE_DECL (c);
> +	  /* Data clasues associated with acc parallel reductions must be
> +	     compatible with present_or_copy.  Warn and adjust the clause
> +	     if that is not the case.  */
> +	  if (ctx->region_type == ORT_ACC_PARALLEL)
> +	    {
> +	      tree t = DECL_P (decl) ? decl : TREE_OPERAND (decl, 0);
> +	      n = NULL;
> +
> +	      if (DECL_P (t))
> +		n = splay_tree_lookup (ctx->variables, (splay_tree_key)t);

There should be space before t.
> +
> +	      if (n && (n->value & GOVD_REDUCTION))
> +		{
> +		  int kind = OMP_CLAUSE_MAP_KIND (c);

Use gomp_map_kind or enum gomp_map_kind instead of int?

> +
> +		  OMP_CLAUSE_MAP_IN_REDUCTION(c) = 1;

Space before (.
> +		  if ((kind & GOMP_MAP_TOFROM) != GOMP_MAP_TOFROM
> +		      && kind != GOMP_MAP_FORCE_PRESENT
> +		      && kind != GOMP_MAP_POINTER)
> +		    {
> +		      warning_at (OMP_CLAUSE_LOCATION (c), 0,
> +				  "incompatible data clause with reduction "
> +				  "on %qE; promoting to present_or_copy",
> +				  DECL_NAME (t));
> +		      OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
> +		    }
> +		}
> +	    }
>  	  if (!DECL_P (decl))
>  	    {
>  	      if ((ctx->region_type & ORT_TARGET) != 0
> @@ -8118,6 +8146,34 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
>  
>  	case OMP_CLAUSE_REDUCTION:
>  	  decl = OMP_CLAUSE_DECL (c);
> +	  /* OpenACC reductions need a present_or_copy data clause.
> +	     Add one if necessary.  Error is the reduction is private.  */
> +	  if (ctx->region_type == ORT_ACC_PARALLEL)
> +	    {
> +	      n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);

Missing space.

> +	      if (n->value & (GOVD_PRIVATE | GOVD_FIRSTPRIVATE))
> +		{
> +		  error_at (OMP_CLAUSE_LOCATION (c), "invalid private "
> +			    "reduction on %qE", DECL_NAME (decl));
> +		}

Please avoid {}s around single statement.  Better don't break the
message into multiple lines in this case, so
		error_at (OMP_CLAUSE_LOCATION (c),
			  "invalid private reduction on %qE",
			  DECL_NAME (decl));
is more readable.

> +	      else if ((n->value & GOVD_MAP) == 0)
> +		{
> +		  tree next = OMP_CLAUSE_CHAIN (c);
> +		  tree nc = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_MAP);

Too long line, please wrap.

> +		  OMP_CLAUSE_SET_MAP_KIND (nc, GOMP_MAP_TOFROM);
> +		  OMP_CLAUSE_DECL (nc) = decl;
> +		  OMP_CLAUSE_CHAIN (c) = nc;
> +		  lang_hooks.decls.omp_finish_clause (nc, pre_p);
> +		  for (; nc; nc = OMP_CLAUSE_CHAIN (nc))
> +		    {
> +		      OMP_CLAUSE_MAP_IN_REDUCTION (nc) = 1;
> +		      if (OMP_CLAUSE_CHAIN (nc) == NULL)
> +			break;
> +		    }

Then the nc; condition doesn't make sense.  Perhaps then
		  while (1)
		    {
		      OMP_CLAUSE_MAP_IN_REDUCTION (nc) = 1;
		      if (OMP_CLAUSE_CHAIN (nc) == NULL)
			break;
		      nc = OMP_CLAUSE_CHAIN (nc);
		    }
or
		  for (; ; nc = OMP_CLAUSE_CHAIN (nc))
		    {
		      OMP_CLAUSE_MAP_IN_REDUCTION (nc) = 1;
		      if (OMP_CLAUSE_CHAIN (nc) == NULL)
			break;
		    }
?

> @@ -5624,22 +5625,38 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
> +		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
> +			    || OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
> +			   && orig == OMP_CLAUSE_DECL (cls))
> +		    {
> +		      is_private = true;
> +		      goto do_lookup;
> +		    }

Isn't this case rejected by the gimplifier?

> @@ -15829,7 +15874,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>  	if (!maybe_lookup_field (var, ctx))
>  	  continue;
>  
> -	if (offloaded)
> +	/* Don't remap oacc parallel reduction variables, because the
> +	   intermediate result must be local to each gang.  */
> +	if (offloaded && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> +			   && OMP_CLAUSE_MAP_IN_REDUCTION(c)))

Missing space before after OMP_CLAUSE_MAP_IN_REDUCTION

Ok for trunk with those changes if the lower_oacc_reduction is_private
handling is still needed, if it is not needed, please clean that up.

	Jakub

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [OpenACC] Don't compile libgomp testcases with '-w' (was: openacc reference reductions)
  2016-02-09 15:14 openacc reference reductions Cesar Philippidis
  2016-02-09 15:33 ` Nathan Sidwell
@ 2021-04-26 10:35 ` Thomas Schwinge
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Schwinge @ 2021-04-26 10:35 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

Hi!

On 2016-02-09T07:14:31-0800, Cesar Philippidis <cesar@codesourcery.com> wrote:
> This patch [...]

... as eventually commited in r234840 (commit
c42cfb5ca3b02756705485e013fa9107aaf28acd "re PR lto/70289 ([openacc] ICE
in input_varpool_node)") added 'dg-additional-options "-w"' for a bunch
of testcases.  We'd like to actually catch compiler diagnostics (and
currently there aren't any), so I've pushed "[OpenACC] Don't compile
libgomp testcases with '-w'" to master branch in commit
7c640779bf042ceb05fe50260307af88fed75407, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-OpenACC-Don-t-compile-libgomp-testcases-with-w.patch --]
[-- Type: text/x-diff, Size: 6076 bytes --]

From 7c640779bf042ceb05fe50260307af88fed75407 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 16 Apr 2021 16:08:19 +0200
Subject: [PATCH] [OpenACC] Don't compile libgomp testcases with '-w'

We'd like to actually catch compiler diagnostics (and currently there aren't
any).

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Don't
	compile with '-w'.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c   | 2 --
 libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c   | 2 --
 .../testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c    | 1 -
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c       | 1 -
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c       | 1 -
 libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90   | 1 -
 libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90          | 1 -
 libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90          | 1 -
 libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90          | 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90          | 1 -
 10 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
index a88b60f39f0..0c1ecc68e9d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
@@ -1,5 +1,3 @@
-/* { dg-additional-options "-w" } */
-
 #include <assert.h>
 
 /* Test of reduction on parallel directive.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
index 5ab52e97009..e3ed281610d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
@@ -1,8 +1,6 @@
 /* Test of reduction on parallel directive (with async).  */
 /* See also Fortran variants in "../libgomp.oacc-fortran/par-reduction-2*".  */
 
-/* { dg-additional-options "-w" } */
-
 #include <assert.h>
 #include <openacc.h>
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
index 077571f29f0..80745816fff 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
@@ -1,5 +1,4 @@
 /* { dg-do run } */
-/* { dg-additional-options "-w" } */
 
 #include <stdlib.h>
 #include <openacc.h>
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
index 6f5d29316a0..1cec9f152c4 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
@@ -1,5 +1,4 @@
 /* { dg-do run } */
-/* { dg-additional-options "-w" } */
 
 /* Multiple reductions.  */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
index af30b31bd34..82a6ade40f9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-6.c
@@ -1,5 +1,4 @@
 /* { dg-do run } */
-/* { dg-additional-options "-w" } */
 
 /* Test reductions on explicitly private variables.  */
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90 b/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
index 487cfc436e2..b926f4fe898 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-additional-options "-w" }
 
 program reduction
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
index 764affdbd18..40c1a86ab92 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-additional-options "-w" }
 
 ! Integer reductions
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
index 833d0e4b9fd..a529e22d9a2 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-additional-options "-w" }
 
 ! subroutine reduction
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
index e76867a313f..249ed2edcbc 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
@@ -1,5 +1,5 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp -w" }
+! { dg-additional-options "-cpp" }
 
 program reduction
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
index a7d6dd8a900..8cffac93a22 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-additional-options "-w" }
 
 ! subroutine reduction with private and firstprivate variables
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-04-26 10:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-09 15:14 openacc reference reductions Cesar Philippidis
2016-02-09 15:33 ` Nathan Sidwell
2016-02-09 16:17   ` Cesar Philippidis
2016-02-22 15:34     ` Cesar Philippidis
2016-02-22 16:23       ` Nathan Sidwell
2016-04-06  1:54       ` Cesar Philippidis
2016-04-06 14:23         ` Jakub Jelinek
2016-04-06 20:21           ` Cesar Philippidis
2016-04-07  9:57             ` Jakub Jelinek
2016-04-08  4:34               ` Cesar Philippidis
2016-04-08  7:41                 ` Jakub Jelinek
2016-04-08  7:44                   ` Jakub Jelinek
2016-04-08 14:14                   ` Nathan Sidwell
2016-04-08 14:21                     ` Jakub Jelinek
2016-04-08 14:46                     ` Cesar Philippidis
2016-04-08 14:49                       ` Nathan Sidwell
2016-04-08 14:35                   ` Cesar Philippidis
2016-04-08 15:30                     ` Jakub Jelinek
2021-04-26 10:35 ` [OpenACC] Don't compile libgomp testcases with '-w' (was: openacc reference reductions) Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).