public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, OpenACC 2.7] Implement reductions for arrays and structs
@ 2024-01-02 15:21 Chung-Lin Tang
  2024-01-10 11:33 ` Julian Brown
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chung-Lin Tang @ 2024-01-02 15:21 UTC (permalink / raw)
  To: gcc-patches, Thomas Schwinge, Andrew Stubbs, Julian Brown; +Cc: Catherine Moore

[-- Attachment #1: Type: text/plain, Size: 4884 bytes --]

Hi Thomas, Andrew,
this patch implements reductions for arrays and structs for OpenACC. Following the pattern for OpenACC reductions, this is mostly in the respective NVPTX/GCN backends' *_goacc_reduction_setup/init/fini/teardown hooks, particularly in the fini part, and [nvptx/gcn]_reduction_update routines. The code is mostly similar between the two targets, with mostly the lack of vector mode handling in GCN.

To Julian, there is a patch to the middle-end neutering, a hack actually, that detects SSA_NAMEs used in reduction array MEM_REFs, and avoids single->parallel copying (by moving those definitions before BUILT_IN_GOACC_SINGLE_COPY_START). This appears to work because reductions do their own initializing of the private copy.

As we discussed in our internal calls, the real proper way is to create the private array in a more appropriate stage, but that is too long a shot for now. The changes here are needed at least for some -O0 cases (when under optimization, propagation of the private copies' local address eliminate the SSA_NAME and things actually just work in that case). So please bear with this hack.

I believe the new added libgomp testcases should be fairly complete. Though note that one case of reduction of * for double arrays has been commented out for now, for there appears to be a (presumably) unrelated issue causing this case to fail (maybe has to do with the loop-based atomic form used by both NVPTX/GCN). Maybe should XFAIL instead of comment out. Will do this in next iteration.

Thanks,
Chung-Lin

2024-01-02  Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/c/ChangeLog:
	* c-parser.cc (c_parser_omp_clause_reduction): Adjustments for
	OpenACC-specific cases.
	* c-typeck.cc (c_oacc_reduction_defined_type_p): New function.
	(c_oacc_reduction_code_name): Likewise.
	(c_finish_omp_clauses): Handle OpenACC cases using new functions.

	gcc/cp/ChangeLog:
	* parser.cc (cp_parser_omp_clause_reduction): Adjustments for
	OpenACC-specific cases.
	* semantics.cc (cp_oacc_reduction_defined_type_p): New function.
	(cp_oacc_reduction_code_name): Likewise.
	(finish_omp_reduction_clause): Handle OpenACC cases using new functions.

	gcc/ChangeLog:
	* config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for
	handling ARRAY_TYPE and RECORD_TYPE reductions.
	(gcn_goacc_reduction_setup): Likewise.
	(gcn_goacc_reduction_init): Likewise.
	(gcn_goacc_reduction_fini): Likewise.
	(gcn_goacc_reduction_teardown): Likewise.

	* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate
	V2SI shuffle using vec_extract op.
	(nvptx_get_shared_red_addr): Adjust type/alignment calculations to
	use TYPE_SIZE/ALIGN_UNIT instead of machine mode based.
	(nvptx_reduction_update): Additions for handling ARRAY_TYPE and
	RECORD_TYPE reductions.
	(nvptx_goacc_reduction_setup): Likewise.
	(nvptx_goacc_reduction_init): Likewise.
	(nvptx_goacc_reduction_fini): Likewise.
	(nvptx_goacc_reduction_teardown): Likewise.

	* omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type
	building to use decl type, rather than generic ptr_type_node.
	(omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op
	construction.
	(lower_oacc_reductions): Add code to teardown/recover array access
	MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements.
	Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT
	instead of machine mode based.

	* omp-oacc-neuter-broadcast.cc (worker_single_copy):
	Add 'hash_set<tree> *array_reduction_base_vars' parameter.
	Add xxx.

	(neuter_worker_single): Add 'hash_set<tree> *array_reduction_base_vars'
	parameter. Adjust recursive calls to self and worker_single_copy.
	(oacc_do_neutering): Add 'hash_set<tree> *array_reduction_base_vars'
	parameter. Adjust call to neuter_worker_single.
	(execute_omp_oacc_neuter_broadcast): Add local
	'hash_set<tree> array_reduction_base_vars' declaration. Collect MEM_REF
	base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add
	'&array_reduction_base_vars' argument to call of oacc_do_neutering.

	* omp-offload.cc (default_goacc_reduction): Add unshare_expr.

	gcc/testsuite/ChangeLog:
	* c-c++-common/goacc/reduction-9.c: New test.
	* c-c++-common/goacc/reduction-10.c: New test.
	* c-c++-common/goacc/reduction-11.c: New test.
	* c-c++-common/goacc/reduction-12.c: New test.
	* c-c++-common/goacc/reduction-13.c: New test.

	libgomp/ChangeLog:
	* testsuite/libgomp.oacc-c-c++-common/reduction.h
	(check_reduction_array_xx): New macro.
	(operator_apply): Likewise.
	(check_reduction_array_op): Likewise.
	(check_reduction_arraysec_op): Likewise.
	(function_apply): Likewise.
	(check_reduction_array_macro): Likewise.
	(check_reduction_arraysec_macro): Likewise.
	(check_reduction_xxx_xx_all): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c: New test.

[-- Attachment #2: openacc-2.7b-array-struct-reduction.patch --]
[-- Type: text/plain, Size: 54684 bytes --]

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index ed92caca814..d13231bc053 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -17201,13 +17201,21 @@ c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
 		code = MAX_EXPR;
 		break;
 	      }
+	    if (!is_omp)
+	      goto name_error;
 	    reduc_id = c_parser_peek_token (parser)->value;
 	    break;
 	  }
 	default:
-	  c_parser_error (parser,
-			  "expected %<+%>, %<*%>, %<-%>, %<&%>, "
-			  "%<^%>, %<|%>, %<&&%>, %<||%> or identifier");
+	name_error:
+	  if (is_omp)
+	    c_parser_error (parser,
+			    "expected %<+%>, %<*%>, %<-%>, %<&%>, "
+			    "%<^%>, %<|%>, %<&&%>, %<||%> or identifier");
+	  else
+	    c_parser_error (parser,
+			    "expected %<+%>, %<*%>, %<-%>, %<&%>, "
+			    "%<^%>, %<|%>, %<&&%>, %<||%>, %<min%> or %<max%>");
 	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
 	  return list;
 	}
@@ -17220,6 +17228,11 @@ c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
 	  nl = c_parser_omp_variable_list (parser, clause_loc, kind, list);
 	  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
 	    {
+	      OMP_CLAUSE_REDUCTION_CODE (c) = code;
+	      /* OpenACC does not require anything below.  */
+	      if (!is_omp)
+		continue;
+
 	      tree d = OMP_CLAUSE_DECL (c), type;
 	      if (TREE_CODE (d) != TREE_LIST)
 		type = TREE_TYPE (d);
@@ -17241,7 +17254,6 @@ c_parser_omp_clause_reduction (c_parser *parser, enum omp_clause_code kind,
 		}
 	      while (TREE_CODE (type) == ARRAY_TYPE)
 		type = TREE_TYPE (type);
-	      OMP_CLAUSE_REDUCTION_CODE (c) = code;
 	      if (task)
 		OMP_CLAUSE_REDUCTION_TASK (c) = 1;
 	      else if (inscan)
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 2d9139d09d2..3c3bcb5f8f9 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14604,6 +14604,68 @@ c_oacc_check_attachments (tree c)
   return false;
 }
 
+static bool
+c_oacc_reduction_defined_type_p (enum tree_code reduction_code, tree t)
+{
+  if (TREE_CODE (t) == INTEGER_TYPE)
+    return true;
+
+  if (FLOAT_TYPE_P (t) || TREE_CODE (t) == COMPLEX_TYPE)
+    switch (reduction_code)
+      {
+      case PLUS_EXPR:
+      case MULT_EXPR:
+      case MINUS_EXPR:
+      case TRUTH_ANDIF_EXPR:
+      case TRUTH_ORIF_EXPR:
+	return true;
+      case MIN_EXPR:
+      case MAX_EXPR:
+	return TREE_CODE (t) != COMPLEX_TYPE;
+      case BIT_AND_EXPR:
+      case BIT_XOR_EXPR:
+      case BIT_IOR_EXPR:
+	return false;
+      default:
+	gcc_unreachable ();
+      }
+
+  if (TREE_CODE (t) == ARRAY_TYPE)
+    return c_oacc_reduction_defined_type_p (reduction_code, TREE_TYPE (t));
+
+  if (TREE_CODE (t) == RECORD_TYPE)
+    {
+      for (tree fld = TYPE_FIELDS (t); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL
+	    && !c_oacc_reduction_defined_type_p (reduction_code,
+						 TREE_TYPE (fld)))
+	  return false;
+      return true;
+    }
+
+  return false;
+}
+
+static const char *
+c_oacc_reduction_code_name (enum tree_code reduction_code)
+{
+  switch (reduction_code)
+    {
+    case PLUS_EXPR: return "+";
+    case MULT_EXPR: return "*";
+    case MINUS_EXPR: return "-";
+    case TRUTH_ANDIF_EXPR: return "&&";
+    case TRUTH_ORIF_EXPR: return "||";
+    case MIN_EXPR: return "min";
+    case MAX_EXPR: return "max";
+    case BIT_AND_EXPR: return "&";
+    case BIT_XOR_EXPR: return "^";
+    case BIT_IOR_EXPR: return "|";
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* For all elements of CLAUSES, validate them against their constraints.
    Remove any elements from the list that are invalid.  */
 
@@ -14794,9 +14856,22 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 		  break;
 		}
 	    }
-	  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) == NULL_TREE
-	      && (FLOAT_TYPE_P (type)
-		  || TREE_CODE (type) == COMPLEX_TYPE))
+	  if (ort == C_ORT_ACC)
+	    {
+	      enum tree_code r_code = OMP_CLAUSE_REDUCTION_CODE (c);
+	      if (!c_oacc_reduction_defined_type_p (r_code, TREE_TYPE (t)))
+		{
+		  const char *r_name = c_oacc_reduction_code_name (r_code);
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%qE has invalid type for %<reduction(%s)%>",
+			    t, r_name);
+		  remove = true;
+		  break;
+		}
+	    }
+	  else if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) == NULL_TREE
+		   && (FLOAT_TYPE_P (type)
+		       || TREE_CODE (type) == COMPLEX_TYPE))
 	    {
 	      enum tree_code r_code = OMP_CLAUSE_REDUCTION_CODE (c);
 	      const char *r_name = NULL;
diff --git a/gcc/config/gcn/gcn-tree.cc b/gcc/config/gcn/gcn-tree.cc
index c99c1767659..55cca4b1b81 100644
--- a/gcc/config/gcn/gcn-tree.cc
+++ b/gcc/config/gcn/gcn-tree.cc
@@ -296,6 +296,105 @@ gcn_reduction_update (location_t loc, gimple_stmt_iterator *gsi,
   tree type = TREE_TYPE (var);
   tree size = TYPE_SIZE (type);
 
+  if (!VAR_P (ptr))
+    {
+      tree t = make_ssa_name (TREE_TYPE (ptr));
+      gimple_seq seq = NULL;
+      gimplify_assign (t, ptr, &seq);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+      ptr = t;
+    }
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      gimple *g;
+      gimple_seq seq = NULL;
+      tree array_type = TREE_TYPE (var);
+      tree array_elem_type = TREE_TYPE (array_type);
+      tree max_index = TYPE_MAX_VALUE (TYPE_DOMAIN (array_type));
+
+      tree init_index = make_ssa_name (TREE_TYPE (max_index));
+      tree loop_index = make_ssa_name (TREE_TYPE (max_index));
+      tree update_index = make_ssa_name (TREE_TYPE (max_index));
+
+      g = gimple_build_assign (init_index,
+			       build_int_cst (TREE_TYPE (init_index), 0));
+      gimple_seq_add_stmt (&seq, g);
+      gimple *init_end = gimple_seq_last (seq);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+
+      basic_block init_bb = gsi_bb (*gsi);
+      edge init_edge = split_block (init_bb, init_end);
+      basic_block loop_bb = init_edge->dest;
+      /* Reset the iterator.  */
+      *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+      seq = NULL;
+      g = gimple_build_assign (update_index, PLUS_EXPR, loop_index,
+			       build_int_cst (TREE_TYPE (loop_index), 1));
+      gimple_seq_add_stmt (&seq, g);
+
+      g = gimple_build_cond (LE_EXPR, update_index, max_index, NULL, NULL);
+      gimple_seq_add_stmt (&seq, g);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+
+      edge post_edge = split_block (loop_bb, g);
+      basic_block post_bb = post_edge->dest;
+      loop_bb = post_edge->src;
+      /* Reset the iterator.  */
+      *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+      /* Place where we insert reduction code below.  */
+      gimple_stmt_iterator reduction_code_gsi = gsi_start_bb (loop_bb);
+
+      post_edge->flags ^= EDGE_FALSE_VALUE | EDGE_FALLTHRU;
+      post_edge->probability = profile_probability::even ();
+      edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_TRUE_VALUE);
+      loop_edge->probability = profile_probability::even ();
+      set_immediate_dominator (CDI_DOMINATORS, loop_bb, init_bb);
+      set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
+
+      gphi *phi = create_phi_node (loop_index, loop_bb);
+      add_phi_arg (phi, init_index, init_edge, loc);
+      add_phi_arg (phi, update_index, loop_edge, loc);
+
+      tree var_aref = build4 (ARRAY_REF, array_elem_type,
+			      var, loop_index, NULL_TREE, NULL_TREE);
+
+      tree red_array = build_simple_mem_ref (ptr);
+      tree red_array_type = TREE_TYPE (red_array);
+      tree red_array_elem_type
+	= build_qualified_type (TREE_TYPE (red_array_type),
+				TYPE_QUALS (red_array_type));
+      tree ptr_aref = build4 (ARRAY_REF, red_array_elem_type,
+			      red_array, loop_index,
+			      NULL_TREE, NULL_TREE);
+
+      gcn_reduction_update (loc, &reduction_code_gsi,
+			    build_fold_addr_expr (ptr_aref),
+			    var_aref, op);
+      return build_simple_mem_ref (ptr);
+    }
+  else if (TREE_CODE (type) == RECORD_TYPE)
+    {
+      for (tree fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL)
+	  {
+	    tree var_fld_ref = build3 (COMPONENT_REF, TREE_TYPE (fld),
+				       var, fld, NULL);
+	    tree ptr_ref = build_simple_mem_ref (ptr);
+	    tree ptr_fld_type
+	      = build_qualified_type (TREE_TYPE (fld),
+				      TYPE_QUALS (TREE_TYPE (ptr_ref)));
+	    tree ptr_fld_ref = build3 (COMPONENT_REF, ptr_fld_type,
+				       ptr_ref, fld, NULL);
+	    gcn_reduction_update (loc, gsi,
+				  build_fold_addr_expr (ptr_fld_ref),
+				  var_fld_ref, op);
+	  }
+      return build_simple_mem_ref (ptr);
+    }
+
   if (size == TYPE_SIZE (unsigned_type_node)
       || size == TYPE_SIZE (long_long_unsigned_type_node))
     return gcn_lockless_update (loc, gsi, ptr, var, op);
@@ -359,11 +458,14 @@ gcn_goacc_reduction_setup (gcall *call)
       gimplify_assign (decl, var, &seq);
     }
 
-  if (lhs)
+  if (lhs
+      && TREE_CODE (TREE_TYPE (var)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (var)) != RECORD_TYPE)
     gimplify_assign (lhs, var, &seq);
 
   pop_gimplify_context (NULL);
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* Expand IFN_GOACC_REDUCTION_INIT.  */
@@ -395,7 +497,8 @@ gcn_goacc_reduction_init (gcall *call)
     gimplify_assign (lhs, init, &seq);
 
   pop_gimplify_context (NULL);
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* Expand IFN_GOACC_REDUCTION_FINI.  */
@@ -439,11 +542,13 @@ gcn_goacc_reduction_fini (gcall *call)
       r = gcn_reduction_update (gimple_location (call), &gsi, accum, var, op);
     }
 
-  if (lhs)
+  if (lhs
+      && TREE_CODE (TREE_TYPE (r)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (r)) != RECORD_TYPE)
     gimplify_assign (lhs, r, &seq);
   pop_gimplify_context (NULL);
-
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* Expand IFN_GOACC_REDUCTION_TEARDOWN.  */
@@ -483,8 +588,8 @@ gcn_goacc_reduction_teardown (gcall *call)
     gimplify_assign (lhs, unshare_expr (var), &seq);
 
   pop_gimplify_context (NULL);
-
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* Implement TARGET_GOACC_REDUCTION.
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 3fb1deb70fd..ee242c37d25 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -2029,19 +2029,15 @@ nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, nvptx_shuffle_kind kind)
       break;
     case E_V2SImode:
       {
-	rtx src0 = gen_rtx_SUBREG (SImode, src, 0);
-	rtx src1 = gen_rtx_SUBREG (SImode, src, 4);
-	rtx dst0 = gen_rtx_SUBREG (SImode, dst, 0);
-	rtx dst1 = gen_rtx_SUBREG (SImode, dst, 4);
 	rtx tmp0 = gen_reg_rtx (SImode);
 	rtx tmp1 = gen_reg_rtx (SImode);
 	start_sequence ();
-	emit_insn (gen_movsi (tmp0, src0));
-	emit_insn (gen_movsi (tmp1, src1));
+	emit_insn (gen_vec_extractv2sisi (tmp0, src, GEN_INT (0)));
+	emit_insn (gen_vec_extractv2sisi (tmp1, src, GEN_INT (1)));
 	emit_insn (nvptx_gen_shuffle (tmp0, tmp0, idx, kind));
 	emit_insn (nvptx_gen_shuffle (tmp1, tmp1, idx, kind));
-	emit_insn (gen_movsi (dst0, tmp0));
-	emit_insn (gen_movsi (dst1, tmp1));
+	emit_insn (gen_vec_setv2si (dst, tmp0, GEN_INT (0)));
+	emit_insn (gen_vec_setv2si (dst, tmp1, GEN_INT (1)));
 	res = get_insns ();
 	end_sequence ();
       }
@@ -6708,11 +6704,9 @@ nvptx_get_shared_red_addr (tree type, tree offset, bool vector)
   enum nvptx_builtins addr_dim = NVPTX_BUILTIN_WORKER_ADDR;
   if (vector)
     addr_dim = NVPTX_BUILTIN_VECTOR_ADDR;
-  machine_mode mode = TYPE_MODE (type);
   tree fndecl = nvptx_builtin_decl (addr_dim, true);
-  tree size = build_int_cst (unsigned_type_node, GET_MODE_SIZE (mode));
-  tree align = build_int_cst (unsigned_type_node,
-			      GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT);
+  tree size = TYPE_SIZE_UNIT (type);
+  tree align = build_int_cst (unsigned_type_node, TYPE_ALIGN_UNIT (type));
   tree call = build_call_expr (fndecl, 3, offset, size, align);
 
   return fold_convert (build_pointer_type (type), call);
@@ -7029,6 +7023,105 @@ nvptx_reduction_update (location_t loc, gimple_stmt_iterator *gsi,
   tree type = TREE_TYPE (var);
   tree size = TYPE_SIZE (type);
 
+  if (!VAR_P (ptr))
+    {
+      tree t = make_ssa_name (TREE_TYPE (ptr));
+      gimple_seq seq = NULL;
+      gimplify_assign (t, ptr, &seq);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+      ptr = t;
+    }
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      gimple *g;
+      gimple_seq seq = NULL;
+      tree array_type = TREE_TYPE (var);
+      tree array_elem_type = TREE_TYPE (array_type);
+      tree max_index = TYPE_MAX_VALUE (TYPE_DOMAIN (array_type));
+
+      tree init_index = make_ssa_name (TREE_TYPE (max_index));
+      tree loop_index = make_ssa_name (TREE_TYPE (max_index));
+      tree update_index = make_ssa_name (TREE_TYPE (max_index));
+
+      g = gimple_build_assign (init_index,
+			       build_int_cst (TREE_TYPE (init_index), 0));
+      gimple_seq_add_stmt (&seq, g);
+      gimple *init_end = gimple_seq_last (seq);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+
+      basic_block init_bb = gsi_bb (*gsi);
+      edge init_edge = split_block (init_bb, init_end);
+      basic_block loop_bb = init_edge->dest;
+      /* Reset the iterator.  */
+      *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+      seq = NULL;
+      g = gimple_build_assign (update_index, PLUS_EXPR, loop_index,
+			       build_int_cst (TREE_TYPE (loop_index), 1));
+      gimple_seq_add_stmt (&seq, g);
+
+      g = gimple_build_cond (LE_EXPR, update_index, max_index, NULL, NULL);
+      gimple_seq_add_stmt (&seq, g);
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+
+      edge post_edge = split_block (loop_bb, g);
+      basic_block post_bb = post_edge->dest;
+      loop_bb = post_edge->src;
+      /* Reset the iterator.  */
+      *gsi = gsi_for_stmt (gsi_stmt (*gsi));
+
+      /* Place where we insert reduction code below.  */
+      gimple_stmt_iterator reduction_code_gsi = gsi_start_bb (loop_bb);
+
+      post_edge->flags ^= EDGE_FALSE_VALUE | EDGE_FALLTHRU;
+      post_edge->probability = profile_probability::even ();
+      edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_TRUE_VALUE);
+      loop_edge->probability = profile_probability::even ();
+      set_immediate_dominator (CDI_DOMINATORS, loop_bb, init_bb);
+      set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
+
+      gphi *phi = create_phi_node (loop_index, loop_bb);
+      add_phi_arg (phi, init_index, init_edge, loc);
+      add_phi_arg (phi, update_index, loop_edge, loc);
+
+      tree var_aref = build4 (ARRAY_REF, array_elem_type,
+			      var, loop_index, NULL_TREE, NULL_TREE);
+
+      tree red_array = build_simple_mem_ref (ptr);
+      tree red_array_type = TREE_TYPE (red_array);
+      tree red_array_elem_type
+	= build_qualified_type (TREE_TYPE (red_array_type),
+				TYPE_QUALS (red_array_type));
+      tree ptr_aref = build4 (ARRAY_REF, red_array_elem_type,
+			      red_array, loop_index,
+			      NULL_TREE, NULL_TREE);
+
+      nvptx_reduction_update (loc, &reduction_code_gsi,
+			      build_fold_addr_expr (ptr_aref),
+			      var_aref, op, level);
+      return build_simple_mem_ref (ptr);
+    }
+  else if (TREE_CODE (type) == RECORD_TYPE)
+    {
+      for (tree fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL)
+	  {
+	    tree var_fld_ref = build3 (COMPONENT_REF, TREE_TYPE (fld),
+				       var, fld, NULL);
+	    tree ptr_ref = build_simple_mem_ref (ptr);
+	    tree ptr_fld_type
+	      = build_qualified_type (TREE_TYPE (fld),
+				      TYPE_QUALS (TREE_TYPE (ptr_ref)));
+	    tree ptr_fld_ref = build3 (COMPONENT_REF, ptr_fld_type,
+				       ptr_ref, fld, NULL);
+	    nvptx_reduction_update (loc, gsi,
+				    build_fold_addr_expr (ptr_fld_ref),
+				    var_fld_ref, op, level);
+	  }
+      return build_simple_mem_ref (ptr);
+    }
+
   if (size == TYPE_SIZE (unsigned_type_node)
       || size == TYPE_SIZE (long_long_unsigned_type_node))
     return nvptx_lockless_update (loc, gsi, ptr, var, op);
@@ -7059,7 +7152,10 @@ nvptx_goacc_reduction_setup (gcall *call, offload_attrs *oa)
     }
   
   if (level == GOMP_DIM_WORKER
-      || (level == GOMP_DIM_VECTOR && oa->vector_length > PTX_WARP_SIZE))
+      || (level == GOMP_DIM_VECTOR
+	  && (oa->vector_length > PTX_WARP_SIZE
+	      || TREE_CODE (TREE_TYPE (var)) == ARRAY_TYPE
+	      || TREE_CODE (TREE_TYPE (var)) == RECORD_TYPE)))
     {
       /* Store incoming value to worker reduction buffer.  */
       tree offset = gimple_call_arg (call, 5);
@@ -7073,11 +7169,14 @@ nvptx_goacc_reduction_setup (gcall *call, offload_attrs *oa)
       gimplify_assign (ref, var, &seq);
     }
 
-  if (lhs)
+  if (lhs
+      && TREE_CODE (TREE_TYPE (var)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (var)) != RECORD_TYPE)
     gimplify_assign (lhs, var, &seq);
 
   pop_gimplify_context (NULL);
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* NVPTX implementation of GOACC_REDUCTION_INIT. */
@@ -7097,7 +7196,9 @@ nvptx_goacc_reduction_init (gcall *call, offload_attrs *oa)
   
   push_gimplify_context (true);
 
-  if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE)
+  if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE
+      && TREE_CODE (TREE_TYPE (var)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (var)) != RECORD_TYPE)
     {
       /* Initialize vector-non-zeroes to INIT_VAL (OP).  */
       tree tid = make_ssa_name (integer_type_node);
@@ -7162,7 +7263,8 @@ nvptx_goacc_reduction_init (gcall *call, offload_attrs *oa)
     }
 
   pop_gimplify_context (NULL);
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* NVPTX implementation of GOACC_REDUCTION_FINI.  */
@@ -7182,7 +7284,9 @@ nvptx_goacc_reduction_fini (gcall *call, offload_attrs *oa)
 
   push_gimplify_context (true);
 
-  if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE)
+  if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE
+      && TREE_CODE (TREE_TYPE (var)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (var)) != RECORD_TYPE)
     {
       /* Emit binary shuffle tree.  TODO. Emit this as an actual loop,
 	 but that requires a method of emitting a unified jump at the
@@ -7229,11 +7333,14 @@ nvptx_goacc_reduction_fini (gcall *call, offload_attrs *oa)
 	}
     }
 
-  if (lhs)
+  if (lhs
+      && TREE_CODE (TREE_TYPE (r)) != ARRAY_TYPE
+      && TREE_CODE (TREE_TYPE (r)) != RECORD_TYPE)
     gimplify_assign (lhs, r, &seq);
-  pop_gimplify_context (NULL);
 
-  gsi_replace_with_seq (&gsi, seq, true);
+  pop_gimplify_context (NULL);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* NVPTX implementation of GOACC_REDUCTION_TEARDOWN.  */
@@ -7249,7 +7356,10 @@ nvptx_goacc_reduction_teardown (gcall *call, offload_attrs *oa)
   
   push_gimplify_context (true);
   if (level == GOMP_DIM_WORKER
-      || (level == GOMP_DIM_VECTOR && oa->vector_length > PTX_WARP_SIZE))
+      || (level == GOMP_DIM_VECTOR
+	  && (oa->vector_length > PTX_WARP_SIZE
+	      || TREE_CODE (TREE_TYPE (var)) == ARRAY_TYPE
+	      || TREE_CODE (TREE_TYPE (var)) == RECORD_TYPE)))
     {
       /* Read the worker reduction buffer.  */
       tree offset = gimple_call_arg (call, 5);
@@ -7272,11 +7382,11 @@ nvptx_goacc_reduction_teardown (gcall *call, offload_attrs *oa)
     }
 
   if (lhs)
-    gimplify_assign (lhs, var, &seq);
+    gimplify_assign (lhs, unshare_expr (var), &seq);
   
   pop_gimplify_context (NULL);
-
-  gsi_replace_with_seq (&gsi, seq, true);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
 }
 
 /* NVPTX reduction expander.  */
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 379aeb56b15..a5e67bd3d68 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -39620,6 +39620,12 @@ cp_parser_omp_clause_reduction (cp_parser *parser, enum omp_clause_code kind,
 	    code = TRUTH_ANDIF_EXPR;
 	  else if (id == ovl_op_identifier (false, TRUTH_ORIF_EXPR))
 	    code = TRUTH_ORIF_EXPR;
+	  if (code == ERROR_MARK && !is_omp)
+	    {
+	      cp_parser_error (parser, "expected %<+%>, %<*%>, %<-%>, %<&%>, "
+			       "%<^%>, %<|%>, %<&&%>, %<||%>, %<min%> or %<max%>");
+	      goto resync_fail;
+	    }
 	  id = omp_reduction_id (code, id, NULL_TREE);
 	  tree scope = parser->scope;
 	  if (scope)
@@ -39647,6 +39653,10 @@ cp_parser_omp_clause_reduction (cp_parser *parser, enum omp_clause_code kind,
   for (c = nlist; c != list; c = OMP_CLAUSE_CHAIN (c))
     {
       OMP_CLAUSE_REDUCTION_CODE (c) = code;
+      /* OpenACC does not require anything below.  */
+      if (!is_omp)
+	continue;
+
       if (task)
 	OMP_CLAUSE_REDUCTION_TASK (c) = 1;
       else if (inscan)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e6dba29ee81..d02d53fd508 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -6409,6 +6409,69 @@ cp_check_omp_declare_reduction (tree udr)
   return true;
 }
 
+
+static bool
+cp_oacc_reduction_defined_type_p (enum tree_code reduction_code, tree t)
+{
+  if (TREE_CODE (t) == INTEGER_TYPE)
+    return true;
+
+  if (FLOAT_TYPE_P (t) || TREE_CODE (t) == COMPLEX_TYPE)
+    switch (reduction_code)
+      {
+      case PLUS_EXPR:
+      case MULT_EXPR:
+      case MINUS_EXPR:
+      case TRUTH_ANDIF_EXPR:
+      case TRUTH_ORIF_EXPR:
+	return true;
+      case MIN_EXPR:
+      case MAX_EXPR:
+	return TREE_CODE (t) != COMPLEX_TYPE;
+      case BIT_AND_EXPR:
+      case BIT_XOR_EXPR:
+      case BIT_IOR_EXPR:
+	return false;
+      default:
+	gcc_unreachable ();
+      }
+
+  if (TREE_CODE (t) == ARRAY_TYPE)
+    return cp_oacc_reduction_defined_type_p (reduction_code, TREE_TYPE (t));
+
+  if (TREE_CODE (t) == RECORD_TYPE)
+    {
+      for (tree fld = TYPE_FIELDS (t); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL
+	    && !cp_oacc_reduction_defined_type_p (reduction_code,
+						  TREE_TYPE (fld)))
+	  return false;
+      return true;
+    }
+
+  return false;
+}
+
+static const char *
+cp_oacc_reduction_code_name (enum tree_code reduction_code)
+{
+  switch (reduction_code)
+    {
+    case PLUS_EXPR: return "+";
+    case MULT_EXPR: return "*";
+    case MINUS_EXPR: return "-";
+    case TRUTH_ANDIF_EXPR: return "&&";
+    case TRUTH_ORIF_EXPR: return "||";
+    case MIN_EXPR: return "min";
+    case MAX_EXPR: return "max";
+    case BIT_AND_EXPR: return "&";
+    case BIT_XOR_EXPR: return "^";
+    case BIT_IOR_EXPR: return "|";
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Helper function of finish_omp_clauses.  Clone STMT as if we were making
    an inline call.  But, remap
    the OMP_DECL1 VAR_DECL (omp_out resp. omp_orig) to PLACEHOLDER
@@ -6453,7 +6516,8 @@ find_omp_placeholder_r (tree *tp, int *, void *data)
    Return true if there is some error and the clause should be removed.  */
 
 static bool
-finish_omp_reduction_clause (tree c, bool *need_default_ctor, bool *need_dtor)
+finish_omp_reduction_clause (tree c, bool *need_default_ctor, bool *need_dtor,
+			     enum c_omp_region_type ort)
 {
   tree t = OMP_CLAUSE_DECL (c);
   bool predefined = false;
@@ -6554,6 +6618,20 @@ finish_omp_reduction_clause (tree c, bool *need_default_ctor, bool *need_dtor)
       return false;
     }
 
+  if (ort == C_ORT_ACC)
+    {
+      enum tree_code r_code = OMP_CLAUSE_REDUCTION_CODE (c);
+      if (!cp_oacc_reduction_defined_type_p (r_code, TREE_TYPE (t)))
+	{
+	  const char *r_name = cp_oacc_reduction_code_name (r_code);
+	  error_at (OMP_CLAUSE_LOCATION (c),
+		    "%qE has invalid type for %<reduction(%s)%>",
+		    t, r_name);
+	  return true;
+	}
+      return false;
+    }
+
   tree id = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c);
 
   type = TYPE_MAIN_VARIANT (type);
@@ -9250,7 +9328,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
 	      && !VAR_P (t) && TREE_CODE (t) != PARM_DECL)
 	    break;
 	  if (finish_omp_reduction_clause (c, &need_default_ctor,
-					   &need_dtor))
+					   &need_dtor, ort))
 	    remove = true;
 	  else
 	    t = OMP_CLAUSE_DECL (c);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index b2dc5ed931e..749fae4e7a6 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1712,10 +1712,14 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		    }
 		  gcc_assert (!splay_tree_lookup (ctx->field_map,
 						  (splay_tree_key) decl));
+		  tree ptr_type = ptr_type_node;
+		  if (TREE_CODE (decl) == ARRAY_REF)
+		    ptr_type
+		      = build_pointer_type (TREE_TYPE (TREE_OPERAND (decl, 0)));
 		  tree field
 		    = build_decl (OMP_CLAUSE_LOCATION (c),
-				  FIELD_DECL, NULL_TREE, ptr_type_node);
-		  SET_DECL_ALIGN (field, TYPE_ALIGN (ptr_type_node));
+				  FIELD_DECL, NULL_TREE, ptr_type);
+		  SET_DECL_ALIGN (field, TYPE_ALIGN (ptr_type));
 		  insert_field_into_struct (ctx->record_type, field);
 		  splay_tree_insert (ctx->field_map, (splay_tree_key) decl,
 				     (splay_tree_value) field);
@@ -4420,6 +4424,27 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx)
 tree
 omp_reduction_init_op (location_t loc, enum tree_code op, tree type)
 {
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      vec<constructor_elt, va_gc> *v = NULL;
+      HOST_WIDE_INT min = tree_to_shwi (TYPE_MIN_VALUE (TYPE_DOMAIN (type)));
+      HOST_WIDE_INT max = tree_to_shwi (TYPE_MAX_VALUE (TYPE_DOMAIN (type)));
+      tree t = omp_reduction_init_op (loc, op, TREE_TYPE (type));
+      for (HOST_WIDE_INT i = min; i <= max; i++)
+	CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
+      return build_constructor (type, v);
+    }
+  else if (TREE_CODE (type) == RECORD_TYPE)
+    {
+      vec<constructor_elt, va_gc> *v = NULL;
+      for (tree fld = TYPE_FIELDS (type); fld; fld = TREE_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL)
+	  CONSTRUCTOR_APPEND_ELT (v, fld,
+				  omp_reduction_init_op (loc, op,
+							 TREE_TYPE (fld)));
+      return build_constructor (type, v);
+    }
+
   switch (op)
     {
     case PLUS_EXPR:
@@ -7406,6 +7431,21 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));
 
 	tree orig = OMP_CLAUSE_DECL (c);
+	tree addr = NULL_TREE;
+	if (TREE_CODE (orig) == MEM_REF)
+	  {
+	    /* Peel away MEM_REF to get at base array VAR_DECL.  */
+	    addr = TREE_OPERAND (orig, 0);
+	    if (TREE_CODE (addr) == POINTER_PLUS_EXPR)
+	      addr = TREE_OPERAND (addr, 0);
+	    if (TREE_CODE (addr) == ADDR_EXPR)
+	      addr = TREE_OPERAND (addr, 0);
+	    else if (INDIRECT_REF_P (addr))
+	      addr = TREE_OPERAND (addr, 0);
+	    orig = addr;
+	    gcc_assert (!is_variable_sized (addr));
+	  }
+
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
 	tree incoming, outgoing, v1, v2, v3;
@@ -7476,6 +7516,18 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  do_lookup:
 	    /* This is the outermost construct with this reduction,
 	       see if there's a mapping for it.  */
+	    if (TREE_CODE (TREE_TYPE (orig)) == ARRAY_TYPE
+		&& gimple_code (outer->stmt) == GIMPLE_OMP_TARGET)
+	      /* Recover original MEM_REF in OMP_CLAUSE_DECL from array
+		 VAR_DECL discovered above. This is due to field lookup
+		 key based on whole MEM_REF earlier during scanning.  */
+	      for (tree c = gimple_omp_target_clauses (outer->stmt); c;
+		   c = OMP_CLAUSE_CHAIN (c))
+		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		    && TREE_CODE (OMP_CLAUSE_DECL (c)) == ARRAY_REF
+		    && TREE_OPERAND (OMP_CLAUSE_DECL (c), 0) == orig)
+		  orig = OMP_CLAUSE_DECL (c);
+
 	    if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
 		&& maybe_lookup_field (orig, outer) && !is_private)
 	      {
@@ -7547,10 +7599,10 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	   variable-sized type.  */
 	fixed_size_mode mode
 	  = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (var)));
-	unsigned align = GET_MODE_ALIGNMENT (mode) /  BITS_PER_UNIT;
+	unsigned align = TYPE_ALIGN_UNIT (TREE_TYPE (var));
 	offset = (offset + align - 1) & ~(align - 1);
 	tree off = build_int_cst (sizetype, offset);
-	offset += GET_MODE_SIZE (mode);
+	offset += tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (var)));
 
 	if (!init_code)
 	  {
diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index 779dc6b1afb..5527509f270 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -991,7 +991,8 @@ worker_single_copy (basic_block from, basic_block to,
 		    hash_set<tree> *worker_partitioned_uses,
 		    tree record_type, record_field_map_t *record_field_map,
 		    unsigned HOST_WIDE_INT placement,
-		    bool isolate_broadcasts, bool has_gang_private_write)
+		    bool isolate_broadcasts, bool has_gang_private_write,
+		    hash_set<tree> *array_reduction_base_vars)
 {
   /* If we only have virtual defs, we'll have no record type, but we still want
      to emit single_copy_start and (particularly) single_copy_end to act as
@@ -1015,6 +1016,37 @@ worker_single_copy (basic_block from, basic_block to,
   edge e = split_block (to, gsi_stmt (gsi));
   basic_block barrier_block = e->dest;
 
+  gimple_seq local_asgns = NULL;
+
+  /* For accesses of variables used in array reductions, instead of
+     propagating the value for the main thread to all other worker threads
+     (which doesn't make sense as a reduction private var), move the defs
+     of such SSA_NAMEs to before the copy block and leave them alone (each
+     thread should access their own local copy).  */
+  for (gimple_stmt_iterator i = gsi_after_labels (from); !gsi_end_p (i);)
+    {
+      gimple *stmt = gsi_stmt (i);
+      if (gimple_assign_single_p (stmt)
+	  && def_escapes_block->contains (gimple_assign_lhs (stmt))
+	  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
+	{
+	  tree lhs = gimple_assign_lhs (stmt);
+	  tree rhs = gimple_assign_rhs1 (stmt);
+	  if (TREE_CODE (rhs) == ADDR_EXPR)
+	    {
+	      rhs = TREE_OPERAND (rhs, 0);
+	      if (local_var_based_p (rhs)
+		  && array_reduction_base_vars->contains (lhs))
+		{
+		  gsi_remove (&i, false);
+		  gimple_seq_add_stmt (&local_asgns, stmt);
+		  continue;
+		}
+	    }
+	}
+      gsi_next (&i);
+    }
+
   gimple_stmt_iterator start = gsi_after_labels (from);
 
   tree decl = builtin_decl_explicit (BUILT_IN_GOACC_SINGLE_COPY_START);
@@ -1029,6 +1061,9 @@ worker_single_copy (basic_block from, basic_block to,
   gsi_insert_before (&start, call, GSI_NEW_STMT);
   update_stmt (call);
 
+  if (local_asgns)
+    gsi_insert_seq_before (&start, local_asgns, GSI_SAME_STMT);
+
   /* The shared-memory range for this block overflowed.  Add a barrier before
      the GOACC_single_copy_start call.  */
   if (isolate_broadcasts)
@@ -1128,6 +1163,22 @@ worker_single_copy (basic_block from, basic_block to,
 	  if (gimple_nop_p (def_stmt))
 	    continue;
 
+	  /* For accesses of variables used in array reductions, skip creating
+	     the barrier phi. Each thread runs same def_stmt to access
+	     local variable, there is no main/worker divide here.  */
+	  if (gimple_assign_single_p (def_stmt))
+	    {
+	      tree lhs = gimple_assign_lhs (def_stmt);
+	      tree rhs = gimple_assign_rhs1 (def_stmt);
+	      if (TREE_CODE (rhs) == ADDR_EXPR)
+		{
+		  rhs = TREE_OPERAND (rhs, 0);
+		  if (local_var_based_p (rhs)
+		      && array_reduction_base_vars->contains (lhs))
+		    continue;
+		}
+	    }
+
 	  /* The barrier phi takes one result from the actual work of the
 	     block we're neutering, and the other result is constant zero of
 	     the same type.  */
@@ -1248,7 +1299,8 @@ neuter_worker_single (parallel_g *par, unsigned outer_mask,
 		      hash_set<tree> *partitioned_var_uses,
 		      record_field_map_t *record_field_map,
 		      blk_offset_map_t *blk_offset_map,
-		      bitmap writes_gang_private)
+		      bitmap writes_gang_private,
+		      hash_set<tree> *array_reduction_base_vars)
 {
   unsigned mask = outer_mask | par->mask;
 
@@ -1398,7 +1450,8 @@ neuter_worker_single (parallel_g *par, unsigned outer_mask,
 				  &worker_partitioned_uses, record_type,
 				  record_field_map,
 				  offset, !range_allocated,
-				  has_gang_private_write);
+				  has_gang_private_write,
+				  array_reduction_base_vars);
 	    }
 	  else
 	    worker_single_simple (block, block, &def_escapes_block);
@@ -1436,11 +1489,13 @@ neuter_worker_single (parallel_g *par, unsigned outer_mask,
   if (par->inner)
     neuter_worker_single (par->inner, mask, worker_single, vector_single,
 			  prop_set, partitioned_var_uses, record_field_map,
-			  blk_offset_map, writes_gang_private);
+			  blk_offset_map, writes_gang_private,
+			  array_reduction_base_vars);
   if (par->next)
     neuter_worker_single (par->next, outer_mask, worker_single, vector_single,
 			  prop_set, partitioned_var_uses, record_field_map,
-			  blk_offset_map, writes_gang_private);
+			  blk_offset_map, writes_gang_private,
+			  array_reduction_base_vars);
 }
 
 static void
@@ -1587,7 +1642,8 @@ merge_ranges (splay_tree accum, splay_tree sp)
 
 static void
 oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo,
-		   unsigned HOST_WIDE_INT bounds_hi)
+		   unsigned HOST_WIDE_INT bounds_hi,
+		   hash_set<tree> *array_reduction_base_vars)
 {
   bb_stmt_map_t bb_stmt_map;
   auto_bitmap worker_single, vector_single;
@@ -1792,7 +1848,8 @@ oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo,
 
   neuter_worker_single (par, mask, worker_single, vector_single, &prop_set,
 			&partitioned_var_uses, &record_field_map,
-			&blk_offset_map, writes_gang_private);
+			&blk_offset_map, writes_gang_private,
+			array_reduction_base_vars);
 
   record_field_map.empty ();
 
@@ -1831,6 +1888,9 @@ execute_omp_oacc_neuter_broadcast ()
       private_size[i] = 0;
     }
 
+  /* Set of base variables referencing arrays used in array reductions.  */
+  hash_set<tree> array_reduction_base_vars;
+
   /* Calculate shared memory size required for reduction variables and
      gang-private memory for this offloaded function.  */
   basic_block bb;
@@ -1869,6 +1929,15 @@ execute_omp_oacc_neuter_broadcast ()
 			   + tree_to_uhwi (TYPE_SIZE_UNIT (var_type)));
 		      reduction_size[level]
 			= MAX (reduction_size[level], limit);
+
+		      tree lhs = gimple_get_lhs (call);
+		      if (TREE_CODE (lhs) == MEM_REF
+			  && TREE_CODE (TREE_OPERAND (lhs, 0)) == SSA_NAME
+			  && TREE_CODE (TREE_TYPE (lhs)) == ARRAY_TYPE)
+			{
+			  tree addr = TREE_OPERAND (lhs, 0);
+			  array_reduction_base_vars.add (addr);
+			}
 		    }
 		}
 	      break;
@@ -1917,7 +1986,7 @@ execute_omp_oacc_neuter_broadcast ()
 
   /* Perform worker partitioning unless we know 'num_workers(1)'.  */
   if (dims[GOMP_DIM_WORKER] != 1)
-    oacc_do_neutering (bounds_lo, bounds_hi);
+    oacc_do_neutering (bounds_lo, bounds_hi, &array_reduction_base_vars);
 
   return 0;
 }
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 1d6dfef74fc..c3eab8c240a 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -1819,7 +1819,7 @@ default_goacc_reduction (gcall *call)
 
   /* Copy VAR to LHS, if there is an LHS.  */
   if (lhs)
-    gimple_seq_add_stmt (&seq, gimple_build_assign (lhs, var));
+    gimple_seq_add_stmt (&seq, gimple_build_assign (lhs, unshare_expr (var)));
 
   gsi_replace_with_seq (&gsi, seq, true);
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-10.c b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
new file mode 100644
index 00000000000..3716e6f3c49
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
@@ -0,0 +1,60 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* float array reductions.  */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  float result[n], array[n];
+  int lresult[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] += array[i];
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] *= array[i];
+
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] > array[i] ? result[j] : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] < array[i] ? result[j] : array[i];
+
+  /* '&&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] && (result[j] > array[i]);
+
+  /* '||' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] || (result[j] > array[i]);
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 6 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-11.c b/gcc/testsuite/c-c++-common/goacc/reduction-11.c
new file mode 100644
index 00000000000..3e3af1a27ed
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-11.c
@@ -0,0 +1,60 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* double array reductions.  */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  double result[n], array[n];
+  int lresult[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] += array[i];
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] *= array[i];
+
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] > array[i] ? result[j] : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] < array[i] ? result[j] : array[i];
+
+  /* '&&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] && (result[j] > array[i]);
+
+  /* '||' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] || (result[j] > array[i]);
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 6 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-12.c b/gcc/testsuite/c-c++-common/goacc/reduction-12.c
new file mode 100644
index 00000000000..39571abfa1b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-12.c
@@ -0,0 +1,46 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* complex array reductions.  */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  __complex__ double result[n], array[n];
+  int lresult[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] += array[i];
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] *= array[i];
+
+  /* '&&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] && (__real__(result[j]) > __real__(array[i]));
+
+  /* '||' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult[j])
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] || (__real__(result[j]) > __real__(array[i]));
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 4 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-13.c b/gcc/testsuite/c-c++-common/goacc/reduction-13.c
new file mode 100644
index 00000000000..1d241bba18d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-13.c
@@ -0,0 +1,51 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* struct reductions.  */
+
+typedef struct { int x, y; } int_pair;
+typedef struct { float m, n; } flt_pair;
+typedef struct
+{
+  int i;
+  double d;
+  float f;
+  int a[4];
+  int_pair ip;
+  flt_pair fp;
+} rectype;
+
+#define n 1000
+
+int
+main(void)
+{
+  int i;
+  rectype result, array[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    {
+      result.i += array[i].i;
+      result.f += array[i].f;
+      result.ip.x += array[i].ip.x;
+      result.ip.y += array[i].ip.y;
+    }
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    {
+      result.i *= array[i].i;
+      result.f *= array[i].f;
+      result.ip.x *= array[i].ip.x;
+      result.ip.y *= array[i].ip.y;
+    }
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-9.c b/gcc/testsuite/c-c++-common/goacc/reduction-9.c
new file mode 100644
index 00000000000..04be548814c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-9.c
@@ -0,0 +1,81 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* Integer array reductions.  */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  int result[n], array[n];
+  int lresult[n];
+
+  /* '+' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] += array[i];
+
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] *= array[i];
+
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] > array[i] ? result[j] : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] = result[j] < array[i] ? result[j] : array[i];
+
+  /* '&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] &= array[i];
+
+  /* '|' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (|:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] |= array[i];
+
+  /* '^' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (^:result)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      result[j] ^= array[i];
+
+  /* '&&' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] && (result[j] > array[i]);
+
+  /* '||' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      lresult[j] = lresult[j] || (result[j] > array[i]);
+
+  return 0;
+}
+
+/* Check that default copy maps are generated for loop reductions.  */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 9 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c
new file mode 100644
index 00000000000..6f1b86a32a7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+
+/* Array reductions.  */
+
+#include <stdlib.h>
+#include "reduction.h"
+
+#define ng 8
+#define nw 4
+#define vl 32
+
+#define N 10
+
+#define check_reduction_array_op_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all(array, op, type, opr, init, b)
+#define check_reduction_arraysec_op_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all(arraysec, op, type, opr, init, b)
+#define check_reduction_array_macro_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all(array, macro, type, opr, init, b)
+#define check_reduction_arraysec_macro_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all(arraysec, macro, type, opr, init, b)
+    
+int
+main (void)
+{
+  const int n = 100;
+  int ints[n];
+  float flts[n];
+  double dbls[n];
+  int cmp_val = 5;
+
+  for (int i = 0; i < n; i++)
+    {
+      ints[i] = i + 1;
+      flts[i] = i + 1;
+      dbls[i] = i + 1;
+    }
+
+  check_reduction_array_op_all (int, +, 0, ints[i]);
+  check_reduction_array_op_all (int, *, 1, ints[i]);
+  check_reduction_array_op_all (int, &, -1, ints[i]);
+  check_reduction_array_op_all (int, |, 0, ints[i]);
+  check_reduction_array_op_all (int, ^, 0, ints[i]);
+  check_reduction_array_op_all (int, &&, 1, (cmp_val > ints[i]));
+  check_reduction_array_op_all (int, ||, 0, (cmp_val > ints[i]));
+  check_reduction_array_macro_all (int, min, n + 1, ints[i]);
+  check_reduction_array_macro_all (int, max, -1, ints[i]);
+
+  check_reduction_array_op_all (float, +, 0, flts[i]);
+  check_reduction_array_op_all (float, *, 1, flts[i]);
+  check_reduction_array_macro_all (float, min, n + 1, flts[i]);
+  check_reduction_array_macro_all (float, max, -1, flts[i]);
+
+  check_reduction_arraysec_op_all (int, +, 0, ints[i]);
+  check_reduction_arraysec_op_all (float, *, 1, flts[i]);
+  check_reduction_arraysec_macro_all (double, min, n + 1, dbls[i]);
+  check_reduction_arraysec_macro_all (double, max, -1, dbls[i]);
+
+  check_reduction_array_op_all (double, +, 0, dbls[i]);
+#if 0
+  /* Currently fails due to unclear issue, presumably unrelated to reduction
+     mechanics. Avoiding for now.  */
+  check_reduction_array_op_all (double, *, 1.0, dbls[i]);
+#endif
+  check_reduction_array_macro_all (double, min, n + 1, dbls[i]);
+  check_reduction_array_macro_all (double, max, -1, dbls[i]);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c
new file mode 100644
index 00000000000..22216ff3008
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c
@@ -0,0 +1,121 @@
+/* { dg-do run } */
+
+/* Struct reductions.  */
+
+#include <stdlib.h>
+#include "reduction.h"
+
+#define ng 8
+#define nw 4
+#define vl 32
+
+#define N 10
+
+typedef struct { int x, y; } int_pair;
+typedef struct { float m, n; } flt_pair;
+typedef struct
+{
+  int i;
+  double d;
+  float f;
+  int a[N];
+  int_pair ip;
+  flt_pair fp;
+} rectype;
+
+static void
+init_struct (rectype *rec, int val)
+{
+  rec->i = val;
+  rec->d = (double) val;
+  rec->f = (float) val;
+  for (int i = 0; i < N; i++)
+    rec->a[i] = val;
+  rec->ip.x = val;
+  rec->ip.y = val;
+  rec->fp.m = (float) val;
+  rec->fp.n = (float) val;
+}
+
+static int
+struct_eq (rectype *a, rectype *b)
+{
+  if (a->i != b->i || a->d != b->d
+      || a->f != b->f
+      || a->ip.x != b->ip.x
+      || a->ip.y != b->ip.y
+      || a->fp.m != b->fp.m
+      || a->fp.n != b->fp.n)
+    return 0;
+
+  for (int i = 0; i < N; i++)
+    if (a->a[i] != b->a[i])
+      return 0;
+  return 1;
+}
+
+#define check_reduction_struct_xx(type, op, init, b, gwv_par, gwv_loop, apply) \
+  {									\
+    type res, vres;							\
+    init_struct (&res, init);						\
+    DO_PRAGMA (acc parallel gwv_par copy(res))				\
+    DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
+    for (int i = 0; i < n; i++)						\
+      {									\
+	res.i = apply (op, res.i, b);					\
+	res.d = apply (op, res.d, b);					\
+	res.f = apply (op, res.f, b);					\
+	for (int j = 0; j < N; j++)					\
+	  res.a[j] = apply (op, res.a[j], b);				\
+	res.ip.x = apply (op, res.ip.x, b);				\
+	res.ip.y = apply (op, res.ip.y, b);				\
+	res.fp.m = apply (op, res.fp.m, b);				\
+	res.fp.n = apply (op, res.fp.n, b);				\
+      }									\
+									\
+    init_struct (&vres, init);						\
+    for (int i = 0; i < n; i++)						\
+      {									\
+        vres.i = apply (op, vres.i, b);					\
+	vres.d = apply (op, vres.d, b);					\
+	vres.f = apply (op, vres.f, b);					\
+	for (int j = 0; j < N; j++)					\
+	  vres.a[j] = apply (op, vres.a[j], b);				\
+	vres.ip.x = apply (op, vres.ip.x, b);				\
+	vres.ip.y = apply (op, vres.ip.y, b);				\
+	vres.fp.m = apply (op, vres.fp.m, b);				\
+	vres.fp.n = apply (op, vres.fp.n, b);				\
+      }									\
+									\
+    if (!struct_eq (&res, &vres))					\
+      __builtin_abort ();						\
+  }
+
+#define operator_apply(op, a, b) (a op b)
+#define check_reduction_struct_op(type, op, init, b, gwv_par, gwv_loop)	\
+  check_reduction_struct_xx(type, op, init, b, gwv_par, gwv_loop, operator_apply)
+
+#define function_apply(op, a, b) (op (a, b))
+#define check_reduction_struct_macro(type, op, init, b, gwv_par, gwv_loop) \
+  check_reduction_struct_xx(type, op, init, b, gwv_par, gwv_loop, function_apply)
+
+#define check_reduction_struct_op_all(type, opr, init, b)	\
+  check_reduction_xxx_xx_all (struct, op, type, opr, init, b)
+#define check_reduction_struct_macro_all(type, opr, init, b)		\
+  check_reduction_xxx_xx_all (struct, macro, type, opr, init, b)
+
+int
+main (void)
+{
+  const int n = 10;
+  int ints[n];
+
+  for (int i = 0; i < n; i++)
+    ints[i] = i + 1;
+
+  check_reduction_struct_op_all (rectype, +, 0, ints[i]);
+  check_reduction_struct_op_all (rectype, *, 1, ints[i]);
+  check_reduction_struct_macro_all (rectype, min, n + 1, ints[i]);
+  check_reduction_struct_macro_all (rectype, max, -1, ints[i]);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
index 1b3f8d45ace..c928578eeea 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
@@ -37,6 +37,58 @@ DO_PRAGMA (acc loop gwv_loop reduction (op:res))			\
       abort ();								\
   }
 
+#define check_reduction_array_xx(type, var, var_in_clause, op, init, b, \
+				 gwv_par, gwv_loop, apply)		\
+  {									\
+   type var[N], var ## _check[N];					\
+   for (int i = 0; i < N; i++)						\
+     var[i] = var ## _check[i] = (init);				\
+   DO_PRAGMA (acc parallel gwv_par copy (var_in_clause))		\
+   DO_PRAGMA (acc loop gwv_loop reduction (op: var_in_clause))		\
+   for (int i = 0; i < n; i++)						\
+     for (int j = 0; j < N; j++)					\
+       var[j] = apply (op, var[j], (b));				\
+									\
+   for (int i = 0; i < n; i++)						\
+     for (int j = 0; j < N; j++)					\
+       var ## _check[j] = apply (op, var ## _check[j], (b));		\
+									\
+   for (int j = 0; j < N; j++)						\
+     if (var[j] != var ## _check[j])					\
+       abort ();							\
+  }
+
+#define operator_apply(op, a, b) (a op b)
+#define check_reduction_array_op(type, op, init, b, gwv_par, gwv_loop)	\
+  check_reduction_array_xx (type, v, v, op, init, b, gwv_par, gwv_loop,	\
+			    operator_apply)
+#define check_reduction_arraysec_op(type, op, init, b, gwv_par, gwv_loop) \
+  check_reduction_array_xx (type, v, v[:N], op, init, b, gwv_par, gwv_loop, \
+			    operator_apply)
+
+
+#define function_apply(op, a, b) (op (a, b))
+#define check_reduction_array_macro(type, op, init, b, gwv_par, gwv_loop)\
+  check_reduction_array_xx (type, v, v, op, init, b, gwv_par, gwv_loop,	\
+			    function_apply)
+#define check_reduction_arraysec_macro(type, op, init, b, gwv_par, gwv_loop)\
+  check_reduction_array_xx (type, v, v[:N], op, init, b, gwv_par, gwv_loop, \
+			    function_apply)
+
+#define check_reduction_xxx_xx_all(tclass, form, type, op, init, b)	\
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b, num_gangs (ng), gang);	\
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b, num_workers (nw), worker); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b, vector_length (vl), vector); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b,			\
+					   num_gangs (ng) num_workers (nw), gang worker); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b,			\
+					   num_gangs (ng) vector_length (vl), gang vector); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b,			\
+					   num_workers (nw) vector_length (vl), worker vector); \
+  check_reduction_ ## tclass ## _ ## form (type, op, init, b, \
+					   num_gangs (ng) num_workers (nw) vector_length (vl), \
+					   gang worker vector);
+
 #define max(a, b) (((a) > (b)) ? (a) : (b))
 #define min(a, b) (((a) < (b)) ? (a) : (b))
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, OpenACC 2.7] Implement reductions for arrays and structs
  2024-01-02 15:21 [PATCH, OpenACC 2.7] Implement reductions for arrays and structs Chung-Lin Tang
@ 2024-01-10 11:33 ` Julian Brown
  2024-02-08 14:47 ` [PATCH, OpenACC 2.7] struct/array reductions for Fortran Chung-Lin Tang
  2024-03-13 17:05 ` [PATCH, OpenACC 2.7] Implement reductions for arrays and structs Tobias Burnus
  2 siblings, 0 replies; 6+ messages in thread
From: Julian Brown @ 2024-01-10 11:33 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: gcc-patches, Thomas Schwinge, Andrew Stubbs, Catherine Moore

On Tue, 2 Jan 2024 23:21:21 +0800
Chung-Lin Tang <cltang@pllab.cs.nthu.edu.tw> wrote:

> To Julian, there is a patch to the middle-end neutering, a hack
> actually, that detects SSA_NAMEs used in reduction array MEM_REFs,
> and avoids single->parallel copying (by moving those definitions
> before BUILT_IN_GOACC_SINGLE_COPY_START). This appears to work
> because reductions do their own initializing of the private copy.

It looks OK to me I think (bearing in mind your following paragraph, of
course!). I wonder though if maybe non-SSA (i.e. addressable) variables
need to be handled also, i.e. parts like this:

+  /* For accesses of variables used in array reductions, instead of
+     propagating the value for the main thread to all other worker threads
+     (which doesn't make sense as a reduction private var), move the defs
+     of such SSA_NAMEs to before the copy block and leave them alone (each
+     thread should access their own local copy).  */
+  for (gimple_stmt_iterator i = gsi_after_labels (from); !gsi_end_p (i);)
+    {
+      gimple *stmt = gsi_stmt (i);
+      if (gimple_assign_single_p (stmt)
+	  && def_escapes_block->contains (gimple_assign_lhs (stmt))
+	  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)

are only handling SSA-converted variables. But maybe that's OK?

> As we discussed in our internal calls, the real proper way is to
> create the private array in a more appropriate stage, but that is too
> long a shot for now. The changes here are needed at least for some
> -O0 cases (when under optimization, propagation of the private
> copies' local address eliminate the SSA_NAME and things actually just
> work in that case). So please bear with this hack.

HTH,

Julian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH, OpenACC 2.7] struct/array reductions for Fortran
@ 2024-02-08 14:47 ` Chung-Lin Tang
  2024-03-13 18:59   ` Tobias Burnus
  2024-03-18 16:39   ` Thomas Schwinge
  0 siblings, 2 replies; 6+ messages in thread
From: Chung-Lin Tang @ 2024-02-08 14:47 UTC (permalink / raw)
  To: gcc-patches, gfortran, Tobias Burnus, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

Hi Tobias, Thomas,
this patch adds support for Fortran to use arrays and struct(record) types in OpenACC reductions.

There is still some shortcomings in the current state, mainly that only explicit-shaped arrays can be used (like its C counterpart). Anything else is currently a bit more complicated in the middle-end, since the existing reduction code creates an "init-op" (literal of initial values) which can't be done when say TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. I think we'll be on the hook to solve this later, but I think the current state is okay to submit.

Tested without regressions on mainline (on top of first struct/array reduction patch[1])

Thanks,
Chung-Lin

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

2024-02-08  Chung-Lin Tang  <cltang@baylibre.com>

gcc/fortran/ChangeLog:
	* openmp.cc (oacc_reduction_defined_type_p): New function.
	(resolve_omp_clauses): Adjust OpenACC array reduction error case. Use
	oacc_reduction_defined_type_p for OpenACC.
	* trans-openmp.cc (gfc_trans_omp_array_reduction_or_udr):
	Add 'bool openacc' parameter, adjust part of function to be !openacc
	only.
	(gfc_trans_omp_reduction_list): Add 'bool openacc' parameter, pass to
	calls to gfc_trans_omp_array_reduction_or_udr.
	(gfc_trans_omp_clauses): Add 'openacc' argument to calls to
	gfc_trans_omp_reduction_list.
	(gfc_trans_omp_do): Pass 'op == EXEC_OACC_LOOP' as 'bool openacc'
	parameter in call to gfc_trans_omp_clauses.

gcc/ChangeLog:
	* omp-low.cc (omp_reduction_init_op): Add checking if reduced array
	has constant bounds.
	(lower_oacc_reductions): Add handling of error_mark_node.

gcc/testsuite/ChangeLog:
	* gfortran.dg/goacc/array-reduction.f90: Adjust testcase.
	* gfortran.dg/goacc/reduction.f95: Likewise.

libgomp/ChangeLog:
	* libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90: New testcase.
	* libgomp/testsuite/libgomp.oacc-fortran/reduction-10.f90: Likewise.
	* libgomp/testsuite/libgomp.oacc-fortran/reduction-11.f90: Likewise.
	* libgomp/testsuite/libgomp.oacc-fortran/reduction-12.f90: Likewise.
	* libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90: Likewise.

[-- Attachment #2: openacc-2.7b-struct-array-reduction-fortran.patch --]
[-- Type: text/plain, Size: 57199 bytes --]

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 0af80d54fad..4bba9e666d6 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7047,6 +7047,72 @@ oacc_is_loop (gfc_code *code)
 	 || code->op == EXEC_OACC_LOOP;
 }
 
+static bool
+oacc_reduction_defined_type_p (enum gfc_omp_reduction_op rop, gfc_typespec *ts)
+{
+  if (rop == OMP_REDUCTION_USER || rop == OMP_REDUCTION_NONE)
+    return false;
+
+  if (ts->type == BT_INTEGER)
+    switch (rop)
+      {
+      case OMP_REDUCTION_AND:
+      case OMP_REDUCTION_OR:
+      case OMP_REDUCTION_EQV:
+      case OMP_REDUCTION_NEQV:
+	return false;
+      default:
+	return true;
+      }
+
+  if (ts->type == BT_LOGICAL)
+    switch (rop)
+      {
+      case OMP_REDUCTION_AND:
+      case OMP_REDUCTION_OR:
+      case OMP_REDUCTION_EQV:
+      case OMP_REDUCTION_NEQV:
+	return true;
+      default:
+	return false;
+      }
+
+  if (ts->type == BT_REAL || ts->type == BT_COMPLEX)
+    switch (rop)
+      {
+      case OMP_REDUCTION_PLUS:
+      case OMP_REDUCTION_TIMES:
+      case OMP_REDUCTION_MINUS:
+	return true;
+
+      case OMP_REDUCTION_AND:
+      case OMP_REDUCTION_OR:
+      case OMP_REDUCTION_EQV:
+      case OMP_REDUCTION_NEQV:
+	return false;
+
+      case OMP_REDUCTION_MAX:
+      case OMP_REDUCTION_MIN:
+	return ts->type != BT_COMPLEX;
+      case OMP_REDUCTION_IAND:
+      case OMP_REDUCTION_IOR:
+      case OMP_REDUCTION_IEOR:
+	return false;
+      default:
+	gcc_unreachable ();
+      }
+
+  if (ts->type == BT_DERIVED)
+    {
+      for (gfc_component *p = ts->u.derived->components; p; p = p->next)
+	if (!oacc_reduction_defined_type_p (rop, &p->ts))
+	  return false;
+      return true;
+    }
+
+  return false;
+}
+
 static void
 resolve_scalar_int_expr (gfc_expr *expr, const char *clause)
 {
@@ -8137,13 +8203,15 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	  else
 	    n->sym->mark = 1;
 
-	  /* OpenACC does not support reductions on arrays.  */
-	  if (n->sym->as)
+	  /* OpenACC current only supports array reductions on explicit-shape
+	     arrays.  */
+	  if ((n->sym->as && n->sym->as->type != AS_EXPLICIT)
+	      || n->sym->attr.codimension)
 	    gfc_error ("Array %qs is not permitted in reduction at %L",
 		       n->sym->name, &n->where);
 	}
     }
-  
+
   for (n = omp_clauses->lists[OMP_LIST_TO]; n; n = n->next)
     n->sym->mark = 0;
   for (n = omp_clauses->lists[OMP_LIST_FROM]; n; n = n->next)
@@ -8797,39 +8865,46 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		  case OMP_LIST_IN_REDUCTION:
 		  case OMP_LIST_TASK_REDUCTION:
 		  case OMP_LIST_REDUCTION_INSCAN:
-		    switch (n->u.reduction_op)
+		    if (openacc)
 		      {
-		      case OMP_REDUCTION_PLUS:
-		      case OMP_REDUCTION_TIMES:
-		      case OMP_REDUCTION_MINUS:
-			if (!gfc_numeric_ts (&n->sym->ts))
+			if (!oacc_reduction_defined_type_p (n->u.reduction_op,
+							    &n->sym->ts))
 			  bad = true;
-			break;
-		      case OMP_REDUCTION_AND:
-		      case OMP_REDUCTION_OR:
-		      case OMP_REDUCTION_EQV:
-		      case OMP_REDUCTION_NEQV:
-			if (n->sym->ts.type != BT_LOGICAL)
-			  bad = true;
-			break;
-		      case OMP_REDUCTION_MAX:
-		      case OMP_REDUCTION_MIN:
-			if (n->sym->ts.type != BT_INTEGER
-			    && n->sym->ts.type != BT_REAL)
-			  bad = true;
-			break;
-		      case OMP_REDUCTION_IAND:
-		      case OMP_REDUCTION_IOR:
-		      case OMP_REDUCTION_IEOR:
-			if (n->sym->ts.type != BT_INTEGER)
-			  bad = true;
-			break;
-		      case OMP_REDUCTION_USER:
-			bad = true;
-			break;
-		      default:
-			break;
 		      }
+		    else
+		      switch (n->u.reduction_op)
+			{
+			case OMP_REDUCTION_PLUS:
+			case OMP_REDUCTION_TIMES:
+			case OMP_REDUCTION_MINUS:
+			  if (!gfc_numeric_ts (&n->sym->ts))
+			    bad = true;
+			  break;
+			case OMP_REDUCTION_AND:
+			case OMP_REDUCTION_OR:
+			case OMP_REDUCTION_EQV:
+			case OMP_REDUCTION_NEQV:
+			  if (n->sym->ts.type != BT_LOGICAL)
+			    bad = true;
+			  break;
+			case OMP_REDUCTION_MAX:
+			case OMP_REDUCTION_MIN:
+			  if (n->sym->ts.type != BT_INTEGER
+			      && n->sym->ts.type != BT_REAL)
+			    bad = true;
+			  break;
+			case OMP_REDUCTION_IAND:
+			case OMP_REDUCTION_IOR:
+			case OMP_REDUCTION_IEOR:
+			  if (n->sym->ts.type != BT_INTEGER)
+			    bad = true;
+			  break;
+			case OMP_REDUCTION_USER:
+			  bad = true;
+			  break;
+			default:
+			  break;
+			}
 		    if (!bad)
 		      n->u2.udr = NULL;
 		    else
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 9599521b97c..29ad880a30c 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1996,7 +1996,8 @@ omp_udr_find_orig (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED,
 }
 
 static void
-gfc_trans_omp_array_reduction_or_udr (tree c, gfc_omp_namelist *n, locus where)
+gfc_trans_omp_array_reduction_or_udr (tree c, gfc_omp_namelist *n, locus where,
+				      bool openacc)
 {
   gfc_symbol *sym = n->sym;
   gfc_symtree *root1 = NULL, *root2 = NULL, *root3 = NULL, *root4 = NULL;
@@ -2251,21 +2252,24 @@ gfc_trans_omp_array_reduction_or_udr (tree c, gfc_omp_namelist *n, locus where)
     poplevel (0, 0);
   OMP_CLAUSE_REDUCTION_INIT (c) = stmt;
 
-  /* Create the merge statement list.  */
-  pushlevel ();
-  if (e4)
-    stmt = gfc_trans_assignment (e3, e4, false, true);
-  else
-    stmt = gfc_trans_call (n->u2.udr->combiner, false,
-			   NULL_TREE, NULL_TREE, false);
-  if (TREE_CODE (stmt) != BIND_EXPR)
-    stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
-  else
-    poplevel (0, 0);
-  OMP_CLAUSE_REDUCTION_MERGE (c) = stmt;
+  if (!openacc)
+    {
+      /* Create the merge statement list.  */
+      pushlevel ();
+      if (e4)
+	stmt = gfc_trans_assignment (e3, e4, false, true);
+      else
+	stmt = gfc_trans_call (n->u2.udr->combiner, false,
+			       NULL_TREE, NULL_TREE, false);
+      if (TREE_CODE (stmt) != BIND_EXPR)
+	stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
+      else
+	poplevel (0, 0);
+      OMP_CLAUSE_REDUCTION_MERGE (c) = stmt;
 
-  /* And stick the placeholder VAR_DECL into the clause as well.  */
-  OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) = outer_decl;
+      /* And stick the placeholder VAR_DECL into the clause as well.  */
+      OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) = outer_decl;
+    }
 
   gfc_current_locus = old_loc;
 
@@ -2296,7 +2300,7 @@ gfc_trans_omp_array_reduction_or_udr (tree c, gfc_omp_namelist *n, locus where)
 
 static tree
 gfc_trans_omp_reduction_list (int kind, gfc_omp_namelist *namelist, tree list,
-			      locus where, bool mark_addressable)
+			      locus where, bool mark_addressable, bool openacc)
 {
   omp_clause_code clause = OMP_CLAUSE_REDUCTION;
   switch (kind)
@@ -2376,7 +2380,8 @@ gfc_trans_omp_reduction_list (int kind, gfc_omp_namelist *namelist, tree list,
 	    if (namelist->sym->attr.dimension
 		|| namelist->u.reduction_op == OMP_REDUCTION_USER
 		|| namelist->sym->attr.allocatable)
-	      gfc_trans_omp_array_reduction_or_udr (node, namelist, where);
+	      gfc_trans_omp_array_reduction_or_udr (node, namelist, where,
+						    openacc);
 	    list = gfc_trans_add_clause (node, list);
 	  }
       }
@@ -2715,7 +2720,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	  /* An OpenACC async clause indicates the need to set reduction
 	     arguments addressable, to allow asynchronous copy-out.  */
 	  omp_clauses = gfc_trans_omp_reduction_list (list, n, omp_clauses,
-						      where, clauses->async);
+						      where, clauses->async,
+						      openacc);
 	  break;
 	case OMP_LIST_PRIVATE:
 	  clause_code = OMP_CLAUSE_PRIVATE;
@@ -5757,7 +5763,8 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
      on the simd construct and DO's clauses are translated elsewhere.  */
   do_clauses->sched_simd = false;
 
-  omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, code->loc);
+  omp_clauses = gfc_trans_omp_clauses (pblock, do_clauses, code->loc, false,
+				       op == EXEC_OACC_LOOP);
 
   for (i = 0; i < collapse; i++)
     {
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index f3a056df8f2..4bbf30627c3 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -4426,9 +4426,16 @@ omp_reduction_init_op (location_t loc, enum tree_code op, tree type)
 {
   if (TREE_CODE (type) == ARRAY_TYPE)
     {
+      tree min_tree = TYPE_MIN_VALUE (TYPE_DOMAIN (type));
+      tree max_tree = TYPE_MAX_VALUE (TYPE_DOMAIN (type));
+      if (!TREE_CONSTANT (min_tree) || !TREE_CONSTANT (max_tree))
+	{
+	  error_at (loc, "array in reduction must be of constant size");
+	  return error_mark_node;
+	}
       vec<constructor_elt, va_gc> *v = NULL;
-      HOST_WIDE_INT min = tree_to_shwi (TYPE_MIN_VALUE (TYPE_DOMAIN (type)));
-      HOST_WIDE_INT max = tree_to_shwi (TYPE_MAX_VALUE (TYPE_DOMAIN (type)));
+      HOST_WIDE_INT min = tree_to_shwi (min_tree);
+      HOST_WIDE_INT max = tree_to_shwi (max_tree);
       tree t = omp_reduction_init_op (loc, op, TREE_TYPE (type));
       for (HOST_WIDE_INT i = min; i <= max; i++)
 	CONSTRUCTOR_APPEND_ELT (v, size_int (i), t);
@@ -7559,6 +7566,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  has_outer_reduction:;
 	  }
 
+	if (incoming == error_mark_node)
+	  continue;
+
 	if (!ref_to_res)
 	  ref_to_res = integer_zero_node;
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90 b/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90
index d71c400a5bf..f9a3b43e7f3 100644
--- a/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/array-reduction.f90
@@ -1,74 +1,80 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-gimple" }
+
 program test
   implicit none
   integer a(10), i
 
   a(:) = 0
-  
+
   ! Array reductions.
-  
-  !$acc parallel reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+
+  !$acc parallel reduction (+:a)
   do i = 1, 10
      a = a + 1
   end do
   !$acc end parallel
 
   !$acc parallel
-  !$acc loop reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+  !$acc loop reduction (+:a)
   do i = 1, 10
      a = a + 1
   end do
   !$acc end parallel
 
   !$acc kernels
-  !$acc loop reduction (+:a) ! { dg-error "Array 'a' is not permitted in reduction" }
+  !$acc loop reduction (+:a)
   do i = 1, 10
      a = a + 1
   end do
   !$acc end kernels
 
   ! Subarray reductions.
-  
-  !$acc parallel reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+
+  !$acc parallel reduction (+:a(1:5))
   do i = 1, 10
      a = a + 1
   end do
   !$acc end parallel
 
   !$acc parallel
-  !$acc loop reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  !$acc loop reduction (+:a(1:5))
   do i = 1, 10
      a = a + 1
   end do
   !$acc end parallel
 
   !$acc kernels
-  !$acc loop reduction (+:a(1:5)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  !$acc loop reduction (+:a(1:5))
   do i = 1, 10
      a = a + 1
   end do
   !$acc end kernels
 
   ! Reductions on array elements.
-  
-  !$acc parallel reduction (+:a(1)) ! { dg-error "Array 'a' is not permitted in reduction" }
+
+  !$acc parallel reduction (+:a(1))
   do i = 1, 10
      a(1) = a(1) + 1
   end do
   !$acc end parallel
 
   !$acc parallel
-  !$acc loop reduction (+:a(1)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  !$acc loop reduction (+:a(1))
   do i = 1, 10
      a(1) = a(1) + 1
   end do
   !$acc end parallel
 
   !$acc kernels
-  !$acc loop reduction (+:a(1)) ! { dg-error "Array 'a' is not permitted in reduction" }
+  !$acc loop reduction (+:a(1))
   do i = 1, 10
      a(1) = a(1) + 1
   end do
   !$acc end kernels
-  
+
   print *, a
 end program test
+
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc loop private\\(i\\) reduction\\(\\+:a\\)" 6 "gimple" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma omp target oacc_parallel reduction\\(\\+:a\\) map\\(tofrom:a \\\[len: \[0-9\]+\\\]\\)" 3 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/reduction.f95 b/gcc/testsuite/gfortran.dg/goacc/reduction.f95
index a13574b150c..c425f00d87f 100644
--- a/gcc/testsuite/gfortran.dg/goacc/reduction.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/reduction.f95
@@ -72,9 +72,9 @@ common /blk/ i1
 !$acc end parallel
 !$acc parallel reduction (-:a1)		! { dg-error "OMP DECLARE REDUCTION - not found for type CHARACTER" }
 !$acc end parallel
-!$acc parallel reduction (+:t1)		! { dg-error "OMP DECLARE REDUCTION \\+ not found for type TYPE" }
+!$acc parallel reduction (+:t1)
 !$acc end parallel
-!$acc parallel reduction (*:ta1)	! { dg-error "OMP DECLARE REDUCTION \\* not found for type TYPE" }
+!$acc parallel reduction (*:ta1)
 !$acc end parallel
 !$acc parallel reduction (.and.:i3)	! { dg-error "OMP DECLARE REDUCTION \\.and\\. not found for type INTEGER" }
 !$acc end parallel
@@ -108,9 +108,9 @@ common /blk/ i1
 !$acc end parallel
 !$acc parallel reduction (max:a1)	! { dg-error "OMP DECLARE REDUCTION max not found for type CHARACTER" }
 !$acc end parallel
-!$acc parallel reduction (min:t1)	! { dg-error "OMP DECLARE REDUCTION min not found for type TYPE" }
+!$acc parallel reduction (min:t1)
 !$acc end parallel
-!$acc parallel reduction (max:ta1)	! { dg-error "OMP DECLARE REDUCTION max not found for type TYPE" }
+!$acc parallel reduction (max:ta1)
 !$acc end parallel
 !$acc parallel reduction (iand:r1)	! { dg-error "OMP DECLARE REDUCTION iand not found for type REAL" }
 !$acc end parallel
@@ -130,32 +130,12 @@ common /blk/ i1
 !$acc end parallel
 !$acc parallel reduction (ior:a1)	! { dg-error "OMP DECLARE REDUCTION ior not found for type CHARACTER" }
 !$acc end parallel
-!$acc parallel reduction (ieor:t1)	! { dg-error "OMP DECLARE REDUCTION ieor not found for type TYPE" }
+!$acc parallel reduction (ieor:t1)
 !$acc end parallel
-!$acc parallel reduction (iand:ta1)	! { dg-error "OMP DECLARE REDUCTION iand not found for type TYPE" }
+!$acc parallel reduction (iand:ta1)
 !$acc end parallel
 
 end subroutine
 
-! { dg-error "Array 'ia2' is not permitted in reduction" "" { target "*-*-*" } 27 }
-! { dg-error "Array 'ra1' is not permitted in reduction" "" { target "*-*-*" } 29 }
-! { dg-error "Array 'ca1' is not permitted in reduction" "" { target "*-*-*" } 31 }
-! { dg-error "Array 'da1' is not permitted in reduction" "" { target "*-*-*" } 33 }
-! { dg-error "Array 'la1' is not permitted in reduction" "" { target "*-*-*" } 35 }
 ! { dg-error "Array 'aa1' is not permitted in reduction" "" { target "*-*-*" } 65 }
 ! { dg-error "Array 'ia1' is not permitted in reduction" "" { target "*-*-*" } 67 }
-! { dg-error "Array 'la1' is not permitted in reduction" "" { target "*-*-*" } 71 }
-! { dg-error "Array 'ta1' is not permitted in reduction" "" { target "*-*-*" } 77 }
-! { dg-error "Array 'ia2' is not permitted in reduction" "" { target "*-*-*" } 81 }
-! { dg-error "Array 'ra1' is not permitted in reduction" "" { target "*-*-*" } 85 }
-! { dg-error "Array 'da1' is not permitted in reduction" "" { target "*-*-*" } 89 }
-! { dg-error "Array 'ca1' is not permitted in reduction" "" { target "*-*-*" } 93 }
-! { dg-error "Array 'ta1' is not permitted in reduction" "" { target "*-*-*" } 99 }
-! { dg-error "Array 'ca1' is not permitted in reduction" "" { target "*-*-*" } 103 }
-! { dg-error "Array 'la1' is not permitted in reduction" "" { target "*-*-*" } 107 }
-! { dg-error "Array 'ta1' is not permitted in reduction" "" { target "*-*-*" } 113 }
-! { dg-error "Array 'ra1' is not permitted in reduction" "" { target "*-*-*" } 117 }
-! { dg-error "Array 'da1' is not permitted in reduction" "" { target "*-*-*" } 121 }
-! { dg-error "Array 'ca1' is not permitted in reduction" "" { target "*-*-*" } 125 }
-! { dg-error "Array 'la1' is not permitted in reduction" "" { target "*-*-*" } 129 }
-! { dg-error "Array 'ta1' is not permitted in reduction" "" { target "*-*-*" } 135 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-10.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-10.f90
new file mode 100644
index 00000000000..506dfaf29f6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-10.f90
@@ -0,0 +1,483 @@
+! { dg-do run }
+
+! real array reductions
+
+program reduction_10
+  implicit none
+
+  integer, parameter     :: n = 10, ng = 8, nw = 4, vl = 32
+  integer                :: i, j
+  real, dimension (n) :: vresult, rg, rw, rv, rc
+  logical, dimension (n) :: lrg, lrw, lrv, lrc, lvresult
+  real, dimension (n) :: array
+
+  do i = 1, n
+     array(i) = i
+  end do
+
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = rg(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = rw(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = rv(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = rc(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = vresult(j) + array(i)
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 1
+  if (count (rw .ne. vresult) .ne. 0) STOP 2
+  if (count (rv .ne. vresult) .ne. 0) STOP 3
+  if (count (rc .ne. vresult) .ne. 0) STOP 4
+
+  !
+  ! '*' reductions
+  !
+
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
+  vresult = 1
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = rg(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = rw(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = rv(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = rc(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = vresult(j) * array(i)
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 5
+  if (count (rw .ne. vresult) .ne. 0) STOP 6
+  if (count (rv .ne. vresult) .ne. 0) STOP 7
+  if (count (rc .ne. vresult) .ne. 0) STOP 8
+
+  !
+  ! 'max' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = max (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = max (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = max (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = max (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = max (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 9
+  if (count (rw .ne. vresult) .ne. 0) STOP 10
+  if (count (rv .ne. vresult) .ne. 0) STOP 11
+  if (count (rc .ne. vresult) .ne. 0) STOP 12
+
+  !
+  ! 'min' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = min (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = min (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = min (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = min (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = min (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 13
+  if (count (rw .ne. vresult) .ne. 0) STOP 14
+  if (count (rv .ne. vresult) .ne. 0) STOP 15
+  if (count (rc .ne. vresult) .ne. 0) STOP 16
+
+  !
+  ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 17
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 18
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 19
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 20
+
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 21
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 22
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 23
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 24
+
+  !
+  ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 25
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 26
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 27
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 28
+
+  !
+  ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 29
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 30
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 31
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 32
+
+end program reduction_10
+
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-11.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-11.f90
new file mode 100644
index 00000000000..4bec1c797cd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-11.f90
@@ -0,0 +1,483 @@
+! { dg-do run }
+
+! double precision array reductions
+
+program reduction_11
+  implicit none
+
+  integer, parameter     :: n = 10, ng = 8, nw = 4, vl = 32
+  integer                :: i, j
+  double precision, dimension (n) :: vresult, rg, rw, rv, rc
+  logical, dimension (n) :: lrg, lrw, lrv, lrc, lvresult
+  double precision, dimension (n) :: array
+
+  do i = 1, n
+     array(i) = i
+  end do
+
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = rg(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = rw(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = rv(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = rc(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = vresult(j) + array(i)
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 1
+  if (count (rw .ne. vresult) .ne. 0) STOP 2
+  if (count (rv .ne. vresult) .ne. 0) STOP 3
+  if (count (rc .ne. vresult) .ne. 0) STOP 4
+
+  !
+  ! '*' reductions
+  !
+
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
+  vresult = 1
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = rg(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = rw(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = rv(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = rc(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = vresult(j) * array(i)
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 5
+  if (count (rw .ne. vresult) .ne. 0) STOP 6
+  if (count (rv .ne. vresult) .ne. 0) STOP 7
+  if (count (rc .ne. vresult) .ne. 0) STOP 8
+
+  !
+  ! 'max' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = max (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = max (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = max (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = max (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = max (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 9
+  if (count (rw .ne. vresult) .ne. 0) STOP 10
+  if (count (rv .ne. vresult) .ne. 0) STOP 11
+  if (count (rc .ne. vresult) .ne. 0) STOP 12
+
+  !
+  ! 'min' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = min (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = min (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = min (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = min (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = min (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 13
+  if (count (rw .ne. vresult) .ne. 0) STOP 14
+  if (count (rv .ne. vresult) .ne. 0) STOP 15
+  if (count (rc .ne. vresult) .ne. 0) STOP 16
+
+  !
+  ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 17
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 18
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 19
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 20
+
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 21
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 22
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 23
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 24
+
+  !
+  ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 25
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 26
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 27
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 28
+
+  !
+  ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 29
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 30
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 31
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 32
+
+end program reduction_11
+
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-12.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-12.f90
new file mode 100644
index 00000000000..b609c7a294e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-12.f90
@@ -0,0 +1,135 @@
+! { dg-do run }
+
+! complex array reductions
+
+program reduction_12
+  implicit none
+
+  integer, parameter     :: n = 10, ng = 8, nw = 4, vl = 32
+  integer                :: i, j
+  complex, dimension (n) :: vresult, rg, rw, rv, rc
+  logical, dimension (n) :: lrg, lrw, lrv, lrc, lvresult
+  complex, dimension (n) :: array
+
+  do i = 1, n
+     array(i) = i
+  end do
+
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = rg(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = rw(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = rv(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = rc(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = vresult(j) + array(i)
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 1
+  if (count (rw .ne. vresult) .ne. 0) STOP 2
+  if (count (rv .ne. vresult) .ne. 0) STOP 3
+  if (count (rc .ne. vresult) .ne. 0) STOP 4
+
+  !
+  ! '*' reductions
+  !
+
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
+  vresult = 1
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = rg(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = rw(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = rv(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = rc(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = vresult(j) * array(i)
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 5
+  if (count (rw .ne. vresult) .ne. 0) STOP 6
+  if (count (rv .ne. vresult) .ne. 0) STOP 7
+  if (count (rc .ne. vresult) .ne. 0) STOP 8
+
+end program reduction_12
+
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90
new file mode 100644
index 00000000000..088c5cd3b04
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90
@@ -0,0 +1,66 @@
+! { dg-do run }
+
+! record type reductions
+
+program reduction_13
+  implicit none
+
+  type t1
+     integer :: i
+     real :: r
+  end type t1
+
+  type t2
+     real :: r
+     integer :: i
+     double precision :: d
+  end type t2
+
+  integer, parameter :: n = 10, ng = 8, nw = 4, vl = 32
+  integer :: i
+  type(t1) :: v1, a1
+  type (t2) :: v2, a2
+
+  v1%i = 0
+  v1%r = 0
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v1)
+  !$acc loop reduction (+:v1)
+  do i = 1, n
+     v1%i = v1%i + 1
+     v1%r = v1%r + 2
+  end do
+  !$acc end parallel
+  a1%i = 0
+  a1%r = 0
+  do i = 1, n
+     a1%i = a1%i + 1
+     a1%r = a1%r + 2
+  end do
+  if (v1%i .ne. a1%i) STOP 1
+  if (v1%r .ne. a1%r) STOP 2
+
+  v2%i = 1
+  v2%r = 1
+  v2%d = 1
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v2)
+  !$acc loop reduction (*:v2)
+  do i = 1, n
+     v2%i = v2%i * 2
+     v2%r = v2%r * 1.1
+     v2%d = v2%d * 1.3
+  end do
+  !$acc end parallel
+  a2%i = 1
+  a2%r = 1
+  a2%d = 1
+  do i = 1, n
+     a2%i = a2%i * 2
+     a2%r = a2%r * 1.1
+     a2%d = a2%d * 1.3
+  end do
+
+  if (v2%i .ne. a2%i) STOP 3
+  if (v2%r .ne. a2%r) STOP 4
+  if (v2%d .ne. a2%d) STOP 5
+
+end program reduction_13
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90
new file mode 100644
index 00000000000..43ab155aa73
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90
@@ -0,0 +1,657 @@
+! { dg-do run }
+
+! integer array reductions
+
+program reduction_9
+  implicit none
+
+  integer, parameter     :: n = 10, ng = 8, nw = 4, vl = 32
+  integer                :: i, j
+  integer, dimension (n) :: vresult, rg, rw, rv, rc
+  logical, dimension (n) :: lrg, lrw, lrv, lrc, lvresult
+  integer, dimension (n) :: array
+
+  do i = 1, n
+     array(i) = i
+  end do
+
+  !
+  ! '+' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(+:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = rg(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(+:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = rw(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(+:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = rv(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(+:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = rc(j) + array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = vresult(j) + array(i)
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 1
+  if (count (rw .ne. vresult) .ne. 0) STOP 2
+  if (count (rv .ne. vresult) .ne. 0) STOP 3
+  if (count (rc .ne. vresult) .ne. 0) STOP 4
+
+  !
+  ! '*' reductions
+  !
+
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
+  vresult = 1
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(*:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = rg(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(*:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = rw(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(*:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = rv(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(*:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = rc(j) * array(i)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = vresult(j) * array(i)
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 5
+  if (count (rw .ne. vresult) .ne. 0) STOP 6
+  if (count (rv .ne. vresult) .ne. 0) STOP 7
+  if (count (rc .ne. vresult) .ne. 0) STOP 8
+
+  !
+  ! 'max' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(max:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = max (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(max:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = max (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(max:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = max (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(max:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = max (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = max (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 9
+  if (count (rw .ne. vresult) .ne. 0) STOP 10
+  if (count (rv .ne. vresult) .ne. 0) STOP 11
+  if (count (rc .ne. vresult) .ne. 0) STOP 12
+
+  !
+  ! 'min' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(min:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = min (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(min:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = min (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(min:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = min (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(min:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = min (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = min (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 13
+  if (count (rw .ne. vresult) .ne. 0) STOP 14
+  if (count (rv .ne. vresult) .ne. 0) STOP 15
+  if (count (rc .ne. vresult) .ne. 0) STOP 16
+
+  !
+  ! 'iand' reductions
+  !
+
+  rg = 1
+  rw = 1
+  rv = 1
+  rc = 1
+  vresult = 1
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(iand:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = iand (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(iand:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = iand (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(iand:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = iand (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(iand:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = iand (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = iand (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 17
+  if (count (rw .ne. vresult) .ne. 0) STOP 18
+  if (count (rv .ne. vresult) .ne. 0) STOP 19
+  if (count (rc .ne. vresult) .ne. 0) STOP 20
+
+  !
+  ! 'ior' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(ior:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = ior (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(ior:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = ior (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(ior:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = ior (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(ior:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = ior (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = ior (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 21
+  if (count (rw .ne. vresult) .ne. 0) STOP 22
+  if (count (rv .ne. vresult) .ne. 0) STOP 23
+  if (count (rc .ne. vresult) .ne. 0) STOP 24
+
+  !
+  ! 'ieor' reductions
+  !
+
+  rg = 0
+  rw = 0
+  rv = 0
+  rc = 0
+  vresult = 0
+
+  !$acc parallel num_gangs(ng) copy(rg)
+  !$acc loop reduction(ieor:rg) gang
+  do i = 1, n
+    do j = 1, n
+      rg(j) = ieor (rg(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(rw)
+  !$acc loop reduction(ieor:rw) worker
+  do i = 1, n
+    do j = 1, n
+      rw(j) = ieor (rw(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(rv)
+  !$acc loop reduction(ieor:rv) vector
+  do i = 1, n
+    do j = 1, n
+      rv(j) = ieor (rv(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(rc)
+  !$acc loop reduction(ieor:rc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      rc(j) = ieor (rc(j), array(i))
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      vresult(j) = ieor (vresult(j), array(i))
+    end do
+  end do
+
+  if (count (rg .ne. vresult) .ne. 0) STOP 25
+  if (count (rw .ne. vresult) .ne. 0) STOP 26
+  if (count (rv .ne. vresult) .ne. 0) STOP 27
+  if (count (rc .ne. vresult) .ne. 0) STOP 28
+
+  !
+  ! '.and.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.and.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.and.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.and.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.and.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .and. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 29
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 30
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 31
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 32
+
+  !
+  ! '.or.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.or.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.or.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.or.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.or.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .or. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 33
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 34
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 35
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 36
+
+  !
+  ! '.eqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.eqv.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.eqv.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.eqv.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.eqv.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .eqv. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 37
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 38
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 39
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 40
+
+  !
+  ! '.neqv.' reductions
+  !
+
+  lrg = .true.
+  lrw = .true.
+  lrv = .true.
+  lrc = .true.
+  lvresult = .true.
+
+  !$acc parallel num_gangs(ng) copy(lrg)
+  !$acc loop reduction(.neqv.:lrg) gang
+  do i = 1, n
+    do j = 1, n
+      lrg(j) = lrg(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_workers(nw) copy(lrw)
+  !$acc loop reduction(.neqv.:lrw) worker
+  do i = 1, n
+    do j = 1, n
+      lrw(j) = lrw(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel vector_length(vl) copy(lrv)
+  !$acc loop reduction(.neqv.:lrv) vector
+  do i = 1, n
+    do j = 1, n
+      lrv(j) = lrv(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(lrc)
+  !$acc loop reduction(.neqv.:lrc) gang worker vector
+  do i = 1, n
+    do j = 1, n
+      lrc(j) = lrc(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+  !$acc end parallel
+
+  ! Verify the results
+  do i = 1, n
+    do j = 1, n
+      lvresult(j) = lvresult(j) .neqv. (array(i) .ge. 5)
+    end do
+  end do
+
+  if (count (lrg .neqv. lvresult) .ne. 0) STOP 41
+  if (count (lrw .neqv. lvresult) .ne. 0) STOP 42
+  if (count (lrv .neqv. lvresult) .ne. 0) STOP 43
+  if (count (lrc .neqv. lvresult) .ne. 0) STOP 44
+
+end program reduction_9
+

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, OpenACC 2.7] Implement reductions for arrays and structs
  2024-01-02 15:21 [PATCH, OpenACC 2.7] Implement reductions for arrays and structs Chung-Lin Tang
  2024-01-10 11:33 ` Julian Brown
  2024-02-08 14:47 ` [PATCH, OpenACC 2.7] struct/array reductions for Fortran Chung-Lin Tang
@ 2024-03-13 17:05 ` Tobias Burnus
  2 siblings, 0 replies; 6+ messages in thread
From: Tobias Burnus @ 2024-03-13 17:05 UTC (permalink / raw)
  To: Chung-Lin Tang, gcc-patches, Thomas Schwinge

Hi Chung-Lin,


https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

Chung-Lin Tang wrote:
> this patch implements reductions for arrays and structs for OpenACC. Following the pattern for OpenACC reductions [...]

(Stumbled over while looking at the Fortran patch, but applying to 
C/C++, hence mentioned here; the Fortran patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645205.html )


OpenACC permits array elements and subarrays. I have not checked whether 
array elements are currently rejected or fully supported, but I miss a 
testcase for both array elements (unless there is one already) and array 
sections.

If implemented, I think there should be a working run-time test.
If not supported, there should be a sorry_at error for those.

Note: the parser should handle array sections as OpenMP handles them.

The testcase should cover something like the following:

void f(int n)
{
   int x[5][5]; // Multimensional array;
   int y[n]; // VLA
   int *z = (int*)malloc(5*5*sizeof(int)); // Allocated array

... reduction(+:x)
... reduction(+:y)

... reduction(+:x[0:5][2:1])  // OK
... reduction(+:x[1:4][2:1])
   // invalid - while contiguous, first dim does not span the whole array
... reduction(+:y[2:2])  // OK
... reduction(+:y[3:])  // OK - same as [3:n-3]
... reduction(+:y[:2])  // OK - same as [0:2]
... reduction(+:z[1:2][1:6])  // OK

And the same where at least one of the const number is replaced by
a variable.

Note: The 'invalid' reduction is fine in terms of being contiguous (last 
dimension contains a single element, hence, the dimension before does 
not need to span the whole extend) - but OpenACC requires the all 
dimensions but the last to span the whole range.

See "2.7.1 Data Specification in Data Clauses" for the subarray description.

I think - if known at compile time - there should be also a diagnostic 
if the any dimension but the last does not span the whole range.

Thanks,

Tobias

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran
  2024-02-08 14:47 ` [PATCH, OpenACC 2.7] struct/array reductions for Fortran Chung-Lin Tang
@ 2024-03-13 18:59   ` Tobias Burnus
  2024-03-18 16:39   ` Thomas Schwinge
  1 sibling, 0 replies; 6+ messages in thread
From: Tobias Burnus @ 2024-03-13 18:59 UTC (permalink / raw)
  To: Chung-Lin Tang, gcc-patches, gfortran, Thomas Schwinge

Hi Chung-Lin, hi Thomas, hello world,

some thoughts glancing at the patch.

Chung-Lin Tang wrote:
> There is still some shortcomings in the current state, mainly that only explicit-shaped arrays can be used (like its C counterpart). Anything else is currently a bit more complicated in the middle-end, since the existing reduction code creates an "init-op" (literal of initial values) which can't be done when say TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. I think we'll be on the hook to solve this later, but I think the current state is okay to submit.

I think having some initial support is fine, but it needs an 
understandable and somewhat complete error diagnostic and testcases. 
More to this below.

> +      if (!TREE_CONSTANT (min_tree) || !TREE_CONSTANT (max_tree))
> +	{
> +	  error_at (loc, "array in reduction must be of constant size");
> +	  return error_mark_node;
> +	}
Shouldn't this use a sorry_at instead?

> +	  /* OpenACC current only supports array reductions on explicit-shape
> +	     arrays.  */
> +	  if ((n->sym->as && n->sym->as->type != AS_EXPLICIT)
> +	      || n->sym->attr.codimension)
>   	    gfc_error ("Array %qs is not permitted in reduction at %L",
>   		       n->sym->name, &n->where);
[Coarray excursion. I am in favor of allowing it for the reasons above, 
but it could be also rejected but I would prefer to have a proper error 
message in that case.]

While coarrays are unspecified, I do not see a reason why a corray 
shouldn't be permitted here – as long as it is not coindexed. At the 
end, it is just a normal array with some additional properties, which 
make it possible to remotely access it.

Note: For coarray scalars, we have 'sym->as', thus the check should be 
'(n->sym->as && n->sym->as->rank)' to permit scalar coarrays.

* * *

Coarray excursion: A coarray variables exists in multiple processes 
("images", e.g. MPI processes). If 'caf' and 'caf2' are coarrays, then 
'caf = 5' and 'i = caf2' refer to the local variable.

On the other hand, 'caf[n] = 5' or 'i = caf[3,m]' refers to the 'caf' 
variable on image 'n' or [3,m]', respectively, which implies in general 
some function call to read or set the remote data, unless the memory is 
directly accessible (→ e.g. some offset calculation) and the compiler 
already knows how to handle this.

While a coarrary might be allocated in some special memory, as long as 
one uses the local version (i.e. not coindexed / without the image index 
in brackets).

Assume for the example above, e.g., integer :: caf[*], caf2[3:6, 7:*].

* * *

Thus, in terms of OpenACC or OpenMP, there is no reason to fret a 
coarray as long as it is not coindexed and as long as OpenMP/OpenACC 
does not interfere with the memory allocation – either directly ('!$omp 
allocators') or indirectly by placing it into special memory (pinned, 
pseudo-unified-shared memory → OG13's -foffload-memory=pinned/unified).

In the meanwhile, OpenMP actually explicitly allows coarrays with few 
exceptions while OpenACC talks about unspecified behavior.

* * *

Back to generic comments:

If I look at the existing code, I see at gfc_match_omp_clause_reduction:

>  if (gfc_match_omp_variable_list (" :", &c->lists[list_idx], false, NULL,
>                                   &head, openacc, allow_derived) != 
> MATCH_YES)

If 'openacc' is true, array sections are permitted - but the code added 
(see quote above) does not handle n->expr at all and only n->sym.

I think there needs to be at least a "gfc_error ("Sorry, subarrays/array 
sections not yet handled" [subarray is the OpenACC wording, 'array 
section' is the Fortran one, which might be clearer.

But you could consider to handle at least array elements, i.e. 
n->expr->rank == 0.

Additionally, I think the current error message is completely unhelpful 
given that some arrays are supported but most are not.

I think there should be also some testcases for the not-yet-supported 
case. I think the following will trigger the omp-low.cc 'sorry_at' (or 
currently 'error' - but I think it should be a sorry):

subroutine foo(n)

integer :: n, A(n)

... reduction(+:A)

And most others will trigger in openmp.cc; for those, you should have an 
allocatable/pointer and assumed-shape arrays for the diagnostic testcase 
as well.

* * *

I have not really experimented with the code, but does it handle 
multi-dimensional constant arrays like 'integer :: a(3:6,10,-1:1)' ? — I 
bet it does, at least after handling my example [2] for the C patch [1].

Thanks,

Tobias

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647704.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran
  2024-02-08 14:47 ` [PATCH, OpenACC 2.7] struct/array reductions for Fortran Chung-Lin Tang
  2024-03-13 18:59   ` Tobias Burnus
@ 2024-03-18 16:39   ` Thomas Schwinge
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Schwinge @ 2024-03-18 16:39 UTC (permalink / raw)
  To: Chung-Lin Tang; +Cc: gcc-patches, fortran, Tobias Burnus

Hi Chung-Lin!

Thanks for your work here, which I'm beginning to look into (prerequisite
"[PATCH, OpenACC 2.7] Implement reductions for arrays and structs",
first, of course); it'll take me some time.


In non-offloading testing, I noticed for x86_64-pc-linux-gnu '-m32':

    +PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
    +PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O0  execution test
    +PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
    +FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O1  execution test
    +PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
    +FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  execution test
    +PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    +FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    +PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
    +FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  execution test
    +PASS: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
    +FAIL: libgomp.oacc-fortran/reduction-13.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -Os  execution test

With optimizations enabled, it runs into 'STOP 4'.

Per '-Wextra':

    [...]/libgomp.oacc-fortran/reduction-13.f90:40:6: Warning: Inequality comparison for REAL(4) at (1) [-Wcompare-reals]
    [...]/libgomp.oacc-fortran/reduction-13.f90:63:6: Warning: Inequality comparison for REAL(4) at (1) [-Wcompare-reals]
    [...]/libgomp.oacc-fortran/reduction-13.f90:64:6: Warning: Inequality comparison for REAL(8) at (1) [-Wcompare-reals]

Do we need to allow for some epsilon (generally in such test cases), or
is there another problem?

For reference:

On 2024-02-08T22:47:13+0800, Chung-Lin Tang <cltang@baylibre.com> wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-13.f90
> @@ -0,0 +1,66 @@
> +! { dg-do run }
> +
> +! record type reductions
> +
> +program reduction_13
> +  implicit none
> +
> +  type t1
> +     integer :: i
> +     real :: r
> +  end type t1
> +
> +  type t2
> +     real :: r
> +     integer :: i
> +     double precision :: d
> +  end type t2
> +
> +  integer, parameter :: n = 10, ng = 8, nw = 4, vl = 32
> +  integer :: i
> +  type(t1) :: v1, a1
> +  type (t2) :: v2, a2
> +
> +  v1%i = 0
> +  v1%r = 0
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v1)
> +  !$acc loop reduction (+:v1)
> +  do i = 1, n
> +     v1%i = v1%i + 1
> +     v1%r = v1%r + 2
> +  end do
> +  !$acc end parallel
> +  a1%i = 0
> +  a1%r = 0
> +  do i = 1, n
> +     a1%i = a1%i + 1
> +     a1%r = a1%r + 2
> +  end do
> +  if (v1%i .ne. a1%i) STOP 1
> +  if (v1%r .ne. a1%r) STOP 2
> +
> +  v2%i = 1
> +  v2%r = 1
> +  v2%d = 1
> +  !$acc parallel num_gangs(ng) num_workers(nw) vector_length(vl) copy(v2)
> +  !$acc loop reduction (*:v2)
> +  do i = 1, n
> +     v2%i = v2%i * 2
> +     v2%r = v2%r * 1.1
> +     v2%d = v2%d * 1.3
> +  end do
> +  !$acc end parallel
> +  a2%i = 1
> +  a2%r = 1
> +  a2%d = 1
> +  do i = 1, n
> +     a2%i = a2%i * 2
> +     a2%r = a2%r * 1.1
> +     a2%d = a2%d * 1.3
> +  end do
> +
> +  if (v2%i .ne. a2%i) STOP 3
> +  if (v2%r .ne. a2%r) STOP 4
> +  if (v2%d .ne. a2%d) STOP 5
> +
> +end program reduction_13


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-03-18 16:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-02 15:21 [PATCH, OpenACC 2.7] Implement reductions for arrays and structs Chung-Lin Tang
2024-01-10 11:33 ` Julian Brown
2024-02-08 14:47 ` [PATCH, OpenACC 2.7] struct/array reductions for Fortran Chung-Lin Tang
2024-03-13 18:59   ` Tobias Burnus
2024-03-18 16:39   ` Thomas Schwinge
2024-03-13 17:05 ` [PATCH, OpenACC 2.7] Implement reductions for arrays and structs Tobias Burnus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).