public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [gomp4] add support for gang local storage allocation in shared memory
@ 2017-02-27 16:21 Cesar Philippidis
  2018-08-13 16:22 ` [PATCH, OpenACC] Add " Julian Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Cesar Philippidis @ 2017-02-27 16:21 UTC (permalink / raw)
  To: gcc-patches, Chung-Lin Tang

[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]

This patch, which is largely implemented by Chung-Lin, is a first step
towards teaching the c and c++ FEs how to allocate shared memory for
gang local variables. E.g.

  #pragma acc parallel
  {
    int some_array[N], some_var;

Both some_array and some_var will be stored in shared memory with this
patch.

Shared memory is allocated for local variables in a similar fashion to
worker reductions. The nvptx BE maintains a global __gangprivate_shared
variable for all of the local variables that require shared memory.
During RTL expansion, decls are checked for an "oacc gangprivate"
attribute, then those decls are remapped to a pointer within
__gangprivate_shared via the new expand_accel_var target hook. That hook
is also responsible for reserving shared memory for each decl in the
offloaded program. The c and c++ FEs attach "oacc gangprivate"
attributes to decls immediately after they process OpenACC kernels and
parallel regions.

This implementation still has a number of limitations, which will be
addressed in follow up patches at some later date:

 * Currently variables in private clauses inside acc loops will not
   utilize shared memory.
 * OpenACC routines don't use shared memory, except for reductions and
   worker state propagation.
 * Variables local to worker loops don't use shared memory.
 * Variables local to automatically partitioned gang and worker loops
   don't use shared memory.
 * Shared memory is allocated globally, not locally on a per-function
   basis. We're not sure if that matters though.

This patch has been applied to gomp-4_0-branch.

Cesar

[-- Attachment #2: gomp4-gang-local-data.diff --]
[-- Type: text/x-patch, Size: 15297 bytes --]

2017-02-27  Chung-Lin Tang  <cltang@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>

	gcc/c/
	* c-parser.c (mark_vars_oacc_gangprivate): New function.
	(c_parser_oacc_kernels_parallel): Call it to mark gang local variables
	with attribute "oacc gangprivate".

	gcc/cp/
	* cp-tree.h (mark_vars_oacc_gangprivate): Declare.
	* parser.c (mark_vars_oacc_gangprivate): New function.
	(cp_parser_oacc_kernels_parallel): Call it to mark gang local variables
	with attribute "oacc gangprivate".
	* pt.c (tsubst_expr): Likewise.

	gcc/
	* config/nvptx/nvptx.c (gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define hook.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap decls marked with the
	"oacc gangprivate" atttribute.
	* omp-low.c (scan_sharing_clauses): Strip out any "oacc gangprivate"
	attributes from acc loop private clauses.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 3f994e3..728c31b 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -14086,6 +14086,32 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
 
 static tree
+mark_vars_oacc_gangprivate (tree *tp,
+			    int *walk_subtrees ATTRIBUTE_UNUSED,
+			    void *data ATTRIBUTE_UNUSED)
+{
+  /* We back away from nested OpenACC non-gang loop directives.  */
+  if (TREE_CODE (*tp) == OACC_LOOP
+      && find_omp_clause (OMP_FOR_CLAUSES (*tp), OMP_CLAUSE_GANG) == NULL_TREE)
+    {
+      return *tp;
+    }
+  if (TREE_CODE (*tp) == BIND_EXPR)
+    {
+      tree block = BIND_EXPR_BLOCK (*tp);
+      for (tree var = BLOCK_VARS (block); var; var = DECL_CHAIN (var))
+	{
+	  gcc_assert (TREE_CODE (var) == VAR_DECL);
+	  DECL_ATTRIBUTES (var)
+	    = tree_cons (get_identifier ("oacc gangprivate"),
+			 NULL, DECL_ATTRIBUTES (var));
+	  c_mark_addressable (var);
+	}
+    }
+  return NULL;
+}
+
+static tree
 c_parser_oacc_kernels_parallel (location_t loc, c_parser *parser,
 				enum pragma_kind p_kind, char *p_name,
 				bool *if_p)
@@ -14119,7 +14145,9 @@ c_parser_oacc_kernels_parallel (location_t loc, c_parser *parser,
 	  tree block = c_begin_omp_parallel ();
 	  tree clauses;
 	  c_parser_oacc_loop (loc, parser, p_name, mask, &clauses, if_p);
-	  return c_finish_omp_construct (loc, code, block, clauses);
+	  block = c_finish_omp_construct (loc, code, block, clauses);
+	  walk_tree_1 (&block, mark_vars_oacc_gangprivate, NULL, NULL, NULL);
+	  return block;
 	}
     }
 
@@ -14128,7 +14156,9 @@ c_parser_oacc_kernels_parallel (location_t loc, c_parser *parser,
   tree block = c_begin_omp_parallel ();
   add_stmt (c_parser_omp_structured_block (parser, if_p));
 
-  return c_finish_omp_construct (loc, code, block, clauses);
+  block = c_finish_omp_construct (loc, code, block, clauses);
+  walk_tree_1 (&block, mark_vars_oacc_gangprivate, NULL, NULL, NULL);
+  return block;
 }
 
 /* OpenACC 2.0:
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a9822e268..f790728 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -66,6 +66,7 @@
 #include "tree-phinodes.h"
 #include "cfgloop.h"
 #include "fold-const.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -136,6 +137,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -167,7 +174,7 @@ nvptx_option_override (void)
   needed_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
   declared_libfuncs_htab
     = hash_table<declared_libfunc_hasher>::create_ggc (17);
-
+  
   worker_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_bcast");
   SET_SYMBOL_DATA_AREA (worker_bcast_sym, DATA_AREA_SHARED);
   worker_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
@@ -175,6 +182,11 @@ nvptx_option_override (void)
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_red");
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
+  gangprivate_shared_sym
+    = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 }
 
 /* Return a ptx type for MODE.  If PROMOTE, then use .u32 for QImode to
@@ -4048,6 +4060,10 @@ nvptx_file_end (void)
   if (worker_red_size)
     write_worker_buffer (asm_out_file, worker_red_sym,
 			 worker_red_align, worker_red_size);
+
+  if (gangprivate_shared_size)
+    write_worker_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
 }
 
 /* Expander for the shuffle builtins.  */
@@ -5073,6 +5089,47 @@ nvptx_goacc_reduction (gcall *call)
     }
 }
 
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (TREE_CODE (var) == VAR_DECL
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size =
+	    (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
+static GTY(()) tree nvptx_previous_fndecl;
+
+static void
+nvptx_set_current_function (tree fndecl)
+{
+  if (!fndecl || fndecl == nvptx_previous_fndecl)
+    return;
+  
+  gangprivate_shared_hmap.empty ();
+  nvptx_previous_fndecl = fndecl;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -5169,6 +5226,9 @@ nvptx_goacc_reduction (gcall *call)
 #undef  TARGET_BUILTIN_DECL
 #define TARGET_BUILTIN_DECL nvptx_builtin_decl
 
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
+
 #undef TARGET_GOACC_VALIDATE_DIMS
 #define TARGET_GOACC_VALIDATE_DIMS nvptx_goacc_validate_dims
 
@@ -5181,6 +5241,9 @@ nvptx_goacc_reduction (gcall *call)
 #undef TARGET_GOACC_REDUCTION
 #define TARGET_GOACC_REDUCTION nvptx_goacc_reduction
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8a635ba..7bd337a 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6015,6 +6015,8 @@ extern bool maybe_clone_body			(tree);
 extern tree cp_convert_range_for (tree, tree, tree, bool);
 extern bool parsing_nsdmi (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
+extern tree mark_vars_oacc_gangprivate (tree *, int *, void *);
+
 
 /* in pt.c */
 extern bool check_template_shadow		(tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ddb0ab1..6dcc099 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -35757,6 +35757,34 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name,
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR_LENGTH)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT))
 
+tree
+mark_vars_oacc_gangprivate (tree *tp,
+			    int *walk_subtrees ATTRIBUTE_UNUSED,
+			    void *data ATTRIBUTE_UNUSED)
+{
+  /* We back away from nested OpenACC non-gang loop directives.  */
+  if (TREE_CODE (*tp) == OACC_LOOP
+      && find_omp_clause (OMP_FOR_CLAUSES (*tp), OMP_CLAUSE_GANG) == NULL_TREE)
+    {
+      return *tp;
+    }
+  if (TREE_CODE (*tp) == BIND_EXPR)
+    {
+      tree block = BIND_EXPR_BLOCK (*tp);
+      if (block == NULL)
+	return NULL;
+      for (tree var = BLOCK_VARS (block); var; var = DECL_CHAIN (var))
+	{
+	  gcc_assert (TREE_CODE (var) == VAR_DECL);
+	  DECL_ATTRIBUTES (var)
+	    = tree_cons (get_identifier ("oacc gangprivate"),
+			 NULL, DECL_ATTRIBUTES (var));
+	  cxx_mark_addressable (var);
+	}
+    }
+  return NULL;
+}
+
 static tree
 cp_parser_oacc_kernels_parallel (cp_parser *parser, cp_token *pragma_tok,
 				 char *p_name, bool *if_p)
@@ -35793,7 +35821,9 @@ cp_parser_oacc_kernels_parallel (cp_parser *parser, cp_token *pragma_tok,
 	  tree stmt = cp_parser_oacc_loop (parser, pragma_tok, p_name, mask,
 					   &clauses, if_p);
 	  protected_set_expr_location (stmt, pragma_tok->location);
-	  return finish_omp_construct (code, block, clauses);
+	  block =  finish_omp_construct (code, block, clauses);
+	  walk_tree_1 (&block, mark_vars_oacc_gangprivate, NULL, NULL, NULL);
+	  return block;
 	}
     }
 
@@ -35804,7 +35834,9 @@ cp_parser_oacc_kernels_parallel (cp_parser *parser, cp_token *pragma_tok,
   unsigned int save = cp_parser_begin_omp_structured_block (parser);
   cp_parser_statement (parser, NULL_TREE, false, if_p);
   cp_parser_end_omp_structured_block (parser, save);
-  return finish_omp_construct (code, block, clauses);
+  block = finish_omp_construct (code, block, clauses);
+  walk_tree_1 (&block, mark_vars_oacc_gangprivate, NULL, NULL, NULL);
+  return block;
 }
 
 /* OpenACC 2.0:
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 2e13a01..56758d6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15530,6 +15530,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
       stmt = begin_omp_parallel ();
       RECUR (OMP_BODY (t));
       finish_omp_construct (TREE_CODE (t), stmt, tmp);
+      walk_tree_1 (&OMP_BODY (t), mark_vars_oacc_gangprivate, NULL, NULL, NULL);
       break;
 
     case OMP_PARALLEL:
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 3de3554..0ab7231 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5801,6 +5801,14 @@ expanded sequence has been inserted.  This hook is also responsible
 for allocating any storage for reductions when necessary.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f31c763..3b66a1d 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4271,6 +4271,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_GOACC_REDUCTION
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index 70540f0..79e7ce5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9591,8 +9591,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 40f2003..73666d4 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2061,7 +2061,19 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	  if (OMP_CLAUSE_PRIVATE_OUTER_REF (c))
 	    goto do_private;
 	  else if (!is_variable_sized (decl))
-	    install_var_local (decl, ctx);
+	    {
+	      tree new_decl = install_var_local (decl, ctx);
+	      /* FIXME: The "oacc gangprivate" attribute conflicts with
+		 the privatization of acc loops.  Remove that attribute,
+		 if present.  */
+	      if (!is_oacc_parallel (ctx))
+		{
+		  tree attributes = DECL_ATTRIBUTES (new_decl);
+		  attributes = remove_attribute ("oacc gangprivate",
+						 attributes);
+		  DECL_ATTRIBUTES (new_decl) = attributes;
+		}
+	    }
 	  break;
 
 	case OMP_CLAUSE_SHARED:
diff --git a/gcc/target.def b/gcc/target.def
index bf8b7d8..c25f30b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1689,6 +1689,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 0000000..40f8b91
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+  
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+    
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+  
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
@ 2018-08-13 16:22 ` Julian Brown
  2018-08-13 18:42   ` Cesar Philippidis
  0 siblings, 1 reply; 26+ messages in thread
From: Julian Brown @ 2018-08-13 16:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries, Chung-Lin Tang

[-- Attachment #1: Type: text/plain, Size: 2967 bytes --]

This patch adds support for placing gang-private variables in NVPTX
per-CU shared memory. This is done by marking up addressable variables
declared at the appropriate parallelism level with an attribute ("oacc
gangprivate") in omp-low.c.

Target-dependent code in the NVPTX backend then modifies the symbol
associated with the variable at expand time via a new target hook
(TARGET_GOACC_EXPAND_ACCEL_VAR) in order to place it in shared memory,
which is faster to access than the ".local" memory that would otherwise
be used for such variables. This has (theoretical, at least)
consequences on program semantics, in that the shared memory is also
statically-allocated rather than obeying stack discipline -- but you
can't have recursive routine calls in OpenACC anyway, so that's no big
deal.

Other targets can use the same attribute in different ways, as
appropriate.

OK for trunk?

Thanks,

Julian

2018-08-10  Julian Brown  <julian@codesourcery.com>
            Chung-Lin Tang  <cltang@codesourcery.com>

        gcc/
        * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
        (gangprivate_shared_size): New global variable.
        (gangprivate_shared_align): Likewise.
        (gangprivate_shared_sym): Likewise.
        (gangprivate_shared_hmap): Likewise.
        (nvptx_option_override): Initialize gangprivate_shared_sym,
        gangprivate_shared_align.
        (nvptx_file_end): Output gangprivate_shared_sym.
        (nvptx_goacc_expand_accel_var): New function.
        (nvptx_set_current_function): New function.
        (TARGET_SET_CURRENT_FUNCTION): Define hook.
        (TARGET_GOACC_EXPAND_ACCEL): Likewise.
        * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
        * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
        * expr.c (expand_expr_real_1): Remap decls marked with the
        "oacc gangprivate" atttribute.
        * omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
        fields.
        (new_omp_context): Initialize oacc_decls in new omp_context.
        (delete_omp_context): Delete oacc_decls in old omp_context.
        (lower_oacc_head_tail): Record partitioning-level count in omp context.
        (oacc_record_private_var_clauses, oacc_record_vars_in_bind)
        (mark_oacc_gangprivate): New functions.
        (lower_omp_for): Call oacc_record_private_var_clauses with "for"
        clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
        (lower_omp_target): Call oacc_record_private_var_clauses with "target"
        clauses.
        Call mark_oacc_gangprivate for offloaded target regions.
        (lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
        * target.def (expand_accel_var): New hook.

        libgomp/
        * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
        * testsuite/libgomp.oacc-c/pr85465.c: New test.

[-- Attachment #2: gang-local-storage-in-shm-1.diff --]
[-- Type: text/x-patch, Size: 17926 bytes --]

commit 9637e7ea887e100f35d99b8d12101f9f8a9b94e3
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Aug 9 20:27:04 2018 -0700

    [OpenACC] Add support for gang local storage allocation in shared memory
    
    2018-08-10  Julian Brown  <julian@codesourcery.com>
    	    Chung-Lin Tang  <cltang@codesourcery.com>
    
    	gcc/
    	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
    	(gangprivate_shared_size): New global variable.
    	(gangprivate_shared_align): Likewise.
    	(gangprivate_shared_sym): Likewise.
    	(gangprivate_shared_hmap): Likewise.
    	(nvptx_option_override): Initialize gangprivate_shared_sym,
    	gangprivate_shared_align.
    	(nvptx_file_end): Output gangprivate_shared_sym.
    	(nvptx_goacc_expand_accel_var): New function.
    	(nvptx_set_current_function): New function.
    	(TARGET_SET_CURRENT_FUNCTION): Define hook.
    	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
    	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
    	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
    	* expr.c (expand_expr_real_1): Remap decls marked with the
    	"oacc gangprivate" atttribute.
    	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
    	fields.
    	(new_omp_context): Initialize oacc_decls in new omp_context.
    	(delete_omp_context): Delete oacc_decls in old omp_context.
    	(lower_oacc_head_tail): Record partitioning-level count in omp context.
    	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
    	(mark_oacc_gangprivate): New functions.
    	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
    	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
    	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
    	clauses.
    	Call mark_oacc_gangprivate for offloaded target regions.
    	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
    	* target.def (expand_accel_var): New hook.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
    	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
    	* testsuite/libgomp.oacc-c/pr85465.c: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c0b0a2e..14eb842 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -4968,6 +4979,10 @@ nvptx_file_end (void)
     write_worker_buffer (asm_out_file, worker_red_sym,
 			 worker_red_align, worker_red_size);
 
+  if (gangprivate_shared_size)
+    write_worker_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -5915,6 +5930,47 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (TREE_CODE (var) == VAR_DECL
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size =
+	    (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
+static GTY(()) tree nvptx_previous_fndecl;
+
+static void
+nvptx_set_current_function (tree fndecl)
+{
+  if (!fndecl || fndecl == nvptx_previous_fndecl)
+    return;
+
+  gangprivate_shared_hmap.empty ();
+  nvptx_previous_fndecl = fndecl;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -6051,6 +6107,12 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a40f45a..fb87f67 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6064,6 +6064,14 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 39a214e..beace61 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4151,6 +4151,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..2c62bf9 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9854,8 +9854,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 843c66f..354e182 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -124,6 +124,12 @@ struct omp_context
 
   /* True if this construct can be cancelled.  */
   bool cancellable;
+
+  /* The number of levels of OpenACC partitioning invoked in this context.  */
+  int oacc_partitioning_levels;
+
+  /* Decls in this context.  */
+  vec<tree> *oacc_decls;
 };
 
 static splay_tree all_contexts;
@@ -850,6 +856,7 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
     }
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
+  ctx->oacc_decls = new vec<tree> ();
 
   return ctx;
 }
@@ -925,6 +932,8 @@ delete_omp_context (splay_tree_value value)
   if (is_task_ctx (ctx))
     finalize_task_copyfn (as_a <gomp_task *> (ctx->stmt));
 
+  delete ctx->oacc_decls;
+
   XDELETE (ctx);
 }
 
@@ -5716,6 +5725,9 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
+
+  ctx->oacc_partitioning_levels = count;
+
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -6732,6 +6744,66 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  if (!ctx)
+    return;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    switch (OMP_CLAUSE_CODE (c))
+      {
+      case OMP_CLAUSE_PRIVATE:
+	{
+	  tree decl = OMP_CLAUSE_DECL (c);
+	  ctx->oacc_decls->safe_push (decl);
+	}
+	break;
+
+      default:
+	/* Empty.  */;
+      }
+}
+
+/* Record vars declared in BINDVARS in CTX.  This information is used to mark
+   up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  if (!ctx)
+    return;
+
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    ctx->oacc_decls->safe_push (v);
+}
+
+/* Mark variables which are declared implicitly or explicitly as gang private
+   with a special attribute.  These may need to have their declarations altered
+   later on in compilation (e.g. in execute_oacc_device_lower or the backend,
+   depending on how the OpenACC execution model is implemented on a given
+   target) to ensure that sharing semantics are correct.
+   Only variables which have their address taken need to be considered.  */
+
+static void
+mark_oacc_gangprivate (vec<tree> *decls)
+{
+  int i;
+  tree decl;
+
+  FOR_EACH_VEC_ELT (*decls, i, decl)
+    {
+      if (TREE_CODE (decl) == VAR_DECL && TREE_ADDRESSABLE (decl))
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("oacc gangprivate"),
+		       NULL, DECL_ATTRIBUTES (decl));
+    }
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -6748,6 +6820,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -6878,7 +6952,20 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   /* Add OpenACC partitioning and reduction markers just before the loop.  */
   if (oacc_head)
-    gimple_seq_add_seq (&body, oacc_head);
+    {
+      gimple_seq_add_seq (&body, oacc_head);
+
+      int level_total = 0;
+      omp_context *thisctx;
+
+      for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
+        level_total += thisctx->oacc_partitioning_levels;
+
+      /* If the current context and parent contexts are distributed over a
+	 total of one parallelism level, we have gang partitioning.  */
+      if (level_total == 1)
+        mark_oacc_gangprivate (ctx->oacc_decls);
+    }
 
   lower_omp_for_lastprivate (&fd, &body, &dlist, ctx);
 
@@ -7511,6 +7598,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
+  oacc_record_private_var_clauses (ctx, clauses);
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -7761,6 +7850,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      mark_oacc_gangprivate (ctx->oacc_decls);
+
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -8755,6 +8846,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
diff --git a/gcc/target.def b/gcc/target.def
index c570f38..b3b24b8 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1701,6 +1701,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 0000000..f378346
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 0000000..2fa708a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,106 @@
+/* { dg-xfail-run-if "gangprivate failure" { openacc_nvidia_accel_selected } { "-O0" } { "" } } */
+
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int ondev = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(ary) copy(ondev) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	if (acc_on_device (acc_device_not_host))
+	  {
+	    int g, w, v;
+
+	    g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	    w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	    v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	    ary[ix] = (g << 16) | (w << 8) | v;
+	    ondev = 1;
+	  }
+	else
+	  ary[ix] = ix;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      if (ondev)
+	{
+	  int g = (ary[ix] >> 16) & 255;
+	  int w = (ary[ix] >> 8) & 255;
+	  int v = ary[ix] & 255;
+
+	  gangdist[g]++;
+	  workerdist[w]++;
+	  vectordist[v]++;
+	}
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/pr85465.c b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
new file mode 100644
index 0000000..329e8a0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int
+main (void)
+{
+#pragma acc parallel
+  foo ();
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-13 16:22 ` [PATCH, OpenACC] Add " Julian Brown
@ 2018-08-13 18:42   ` Cesar Philippidis
  2018-08-13 19:06     ` Cesar Philippidis
  2018-08-13 20:42     ` Julian Brown
  0 siblings, 2 replies; 26+ messages in thread
From: Cesar Philippidis @ 2018-08-13 18:42 UTC (permalink / raw)
  To: Julian Brown, gcc-patches; +Cc: Tom de Vries, Chung-Lin Tang

On 08/13/2018 09:21 AM, Julian Brown wrote:

> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> new file mode 100644
> index 0000000..2fa708a
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> @@ -0,0 +1,106 @@
> +/* { dg-xfail-run-if "gangprivate failure" { openacc_nvidia_accel_selected } { "-O0" } { "" } } */

As a quick comment, I like the approach that you've taken with this
patch, but the og8 patch only applies the gangprivate attribute in the
c/c++ FE. I'd have to review the notes, but I seem to recall that
excluding that clause in fortran was deliberate. Chung-Lin, do you
recall the rationale behind that?

With that aside, is the above xfail still necessary? It seems to xpass
for me on nvptx. However, I see this regression on the host:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-gwv-2.c
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1  -O2  execution test

There could be other regressions, but I only tested the new tests
introduced by the patch so far.

Cesar

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-13 18:42   ` Cesar Philippidis
@ 2018-08-13 19:06     ` Cesar Philippidis
  2018-08-15 16:46       ` Julian Brown
  2018-08-13 20:42     ` Julian Brown
  1 sibling, 1 reply; 26+ messages in thread
From: Cesar Philippidis @ 2018-08-13 19:06 UTC (permalink / raw)
  To: Julian Brown, gcc-patches; +Cc: Tom de Vries, Chung-Lin Tang

On 08/13/2018 11:42 AM, Cesar Philippidis wrote:
> On 08/13/2018 09:21 AM, Julian Brown wrote:
> 
>> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
>> new file mode 100644
>> index 0000000..2fa708a
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
>> @@ -0,0 +1,106 @@
>> +/* { dg-xfail-run-if "gangprivate failure" { openacc_nvidia_accel_selected } { "-O0" } { "" } } */
> 
> As a quick comment, I like the approach that you've taken with this
> patch, but the og8 patch only applies the gangprivate attribute in the
> c/c++ FE. I'd have to review the notes, but I seem to recall that
> excluding that clause in fortran was deliberate. Chung-Lin, do you
> recall the rationale behind that?

I found this in an old email:

  The older version of fortran that OpenACC supports doesn't have a
  concept of lexically scoped blocks like c/c++, so this isn't relevant
  except for explicit gang private variables.

So in other words, this is safe for fortran. It probably could use a
fortran test, because that functionality wasn't explicitly exercised in
og7/og8.

Cesar

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-13 18:42   ` Cesar Philippidis
  2018-08-13 19:06     ` Cesar Philippidis
@ 2018-08-13 20:42     ` Julian Brown
  2021-05-19 12:10       ` Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c' (was: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory) Thomas Schwinge
  1 sibling, 1 reply; 26+ messages in thread
From: Julian Brown @ 2018-08-13 20:42 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Tom de Vries, Chung-Lin Tang

[-- Attachment #1: Type: text/plain, Size: 1564 bytes --]

On Mon, 13 Aug 2018 11:42:26 -0700
Cesar Philippidis <cesar@codesourcery.com> wrote:

> On 08/13/2018 09:21 AM, Julian Brown wrote:
> 
> > diff --git
> > a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> > b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c new file
> > mode 100644 index 0000000..2fa708a --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
> > @@ -0,0 +1,106 @@
> > +/* { dg-xfail-run-if "gangprivate
> > failure" { openacc_nvidia_accel_selected } { "-O0" } { "" } } */  
> 
> As a quick comment, I like the approach that you've taken with this
> patch, but the og8 patch only applies the gangprivate attribute in the
> c/c++ FE. I'd have to review the notes, but I seem to recall that
> excluding that clause in fortran was deliberate. Chung-Lin, do you
> recall the rationale behind that?
> 
> With that aside, is the above xfail still necessary? It seems to xpass
> for me on nvptx. However, I see this regression on the host:
> 
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-gwv-2.c
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1  -O2  execution test
> 
> There could be other regressions, but I only tested the new tests
> introduced by the patch so far.

Oops, this was the version of the patch I meant to post (and the one I
tested). The XFAIL on loop-gwv-2.c isn't necessary, plus that test
needed some other fixes to make it pass for NVPTX (it was written for
GCN to start with).

Everything else is the same. I'll see what I can come up with for a
Fortran test.

Thanks,

Julian

[-- Attachment #2: gang-local-storage-in-shm-2.diff --]
[-- Type: text/x-patch, Size: 17676 bytes --]

commit 7834b2f0dffec3e56e510c04e1663424b778fdfb
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Aug 9 20:27:04 2018 -0700

    [OpenACC] Add support for gang local storage allocation in shared memory
    
    2018-08-10  Julian Brown  <julian@codesourcery.com>
    	    Chung-Lin Tang  <cltang@codesourcery.com>
    
    	gcc/
    	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
    	(gangprivate_shared_size): New global variable.
    	(gangprivate_shared_align): Likewise.
    	(gangprivate_shared_sym): Likewise.
    	(gangprivate_shared_hmap): Likewise.
    	(nvptx_option_override): Initialize gangprivate_shared_sym,
    	gangprivate_shared_align.
    	(nvptx_file_end): Output gangprivate_shared_sym.
    	(nvptx_goacc_expand_accel_var): New function.
    	(nvptx_set_current_function): New function.
    	(TARGET_SET_CURRENT_FUNCTION): Define hook.
    	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
    	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
    	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
    	* expr.c (expand_expr_real_1): Remap decls marked with the
    	"oacc gangprivate" atttribute.
    	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
    	fields.
    	(new_omp_context): Initialize oacc_decls in new omp_context.
    	(delete_omp_context): Delete oacc_decls in old omp_context.
    	(lower_oacc_head_tail): Record partitioning-level count in omp context.
    	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
    	(mark_oacc_gangprivate): New functions.
    	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
    	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
    	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
    	clauses.
    	Call mark_oacc_gangprivate for offloaded target regions.
    	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
    	* target.def (expand_accel_var): New hook.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
    	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
    	* testsuite/libgomp.oacc-c/pr85465.c: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c0b0a2e..14eb842 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -4968,6 +4979,10 @@ nvptx_file_end (void)
     write_worker_buffer (asm_out_file, worker_red_sym,
 			 worker_red_align, worker_red_size);
 
+  if (gangprivate_shared_size)
+    write_worker_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -5915,6 +5930,47 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (TREE_CODE (var) == VAR_DECL
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size =
+	    (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
+static GTY(()) tree nvptx_previous_fndecl;
+
+static void
+nvptx_set_current_function (tree fndecl)
+{
+  if (!fndecl || fndecl == nvptx_previous_fndecl)
+    return;
+
+  gangprivate_shared_hmap.empty ();
+  nvptx_previous_fndecl = fndecl;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -6051,6 +6107,12 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a40f45a..fb87f67 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6064,6 +6064,14 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 39a214e..beace61 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4151,6 +4151,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..2c62bf9 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9854,8 +9854,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 843c66f..354e182 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -124,6 +124,12 @@ struct omp_context
 
   /* True if this construct can be cancelled.  */
   bool cancellable;
+
+  /* The number of levels of OpenACC partitioning invoked in this context.  */
+  int oacc_partitioning_levels;
+
+  /* Decls in this context.  */
+  vec<tree> *oacc_decls;
 };
 
 static splay_tree all_contexts;
@@ -850,6 +856,7 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
     }
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
+  ctx->oacc_decls = new vec<tree> ();
 
   return ctx;
 }
@@ -925,6 +932,8 @@ delete_omp_context (splay_tree_value value)
   if (is_task_ctx (ctx))
     finalize_task_copyfn (as_a <gomp_task *> (ctx->stmt));
 
+  delete ctx->oacc_decls;
+
   XDELETE (ctx);
 }
 
@@ -5716,6 +5725,9 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
+
+  ctx->oacc_partitioning_levels = count;
+
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -6732,6 +6744,66 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  if (!ctx)
+    return;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    switch (OMP_CLAUSE_CODE (c))
+      {
+      case OMP_CLAUSE_PRIVATE:
+	{
+	  tree decl = OMP_CLAUSE_DECL (c);
+	  ctx->oacc_decls->safe_push (decl);
+	}
+	break;
+
+      default:
+	/* Empty.  */;
+      }
+}
+
+/* Record vars declared in BINDVARS in CTX.  This information is used to mark
+   up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  if (!ctx)
+    return;
+
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    ctx->oacc_decls->safe_push (v);
+}
+
+/* Mark variables which are declared implicitly or explicitly as gang private
+   with a special attribute.  These may need to have their declarations altered
+   later on in compilation (e.g. in execute_oacc_device_lower or the backend,
+   depending on how the OpenACC execution model is implemented on a given
+   target) to ensure that sharing semantics are correct.
+   Only variables which have their address taken need to be considered.  */
+
+static void
+mark_oacc_gangprivate (vec<tree> *decls)
+{
+  int i;
+  tree decl;
+
+  FOR_EACH_VEC_ELT (*decls, i, decl)
+    {
+      if (TREE_CODE (decl) == VAR_DECL && TREE_ADDRESSABLE (decl))
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("oacc gangprivate"),
+		       NULL, DECL_ATTRIBUTES (decl));
+    }
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -6748,6 +6820,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -6878,7 +6952,20 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   /* Add OpenACC partitioning and reduction markers just before the loop.  */
   if (oacc_head)
-    gimple_seq_add_seq (&body, oacc_head);
+    {
+      gimple_seq_add_seq (&body, oacc_head);
+
+      int level_total = 0;
+      omp_context *thisctx;
+
+      for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
+        level_total += thisctx->oacc_partitioning_levels;
+
+      /* If the current context and parent contexts are distributed over a
+	 total of one parallelism level, we have gang partitioning.  */
+      if (level_total == 1)
+        mark_oacc_gangprivate (ctx->oacc_decls);
+    }
 
   lower_omp_for_lastprivate (&fd, &body, &dlist, ctx);
 
@@ -7511,6 +7598,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
+  oacc_record_private_var_clauses (ctx, clauses);
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -7761,6 +7850,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      mark_oacc_gangprivate (ctx->oacc_decls);
+
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -8755,6 +8846,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
diff --git a/gcc/target.def b/gcc/target.def
index c570f38..b3b24b8 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1701,6 +1701,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 0000000..f378346
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 0000000..a4f81a3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/pr85465.c b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
new file mode 100644
index 0000000..329e8a0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int
+main (void)
+{
+#pragma acc parallel
+  foo ();
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-13 19:06     ` Cesar Philippidis
@ 2018-08-15 16:46       ` Julian Brown
  2018-08-15 19:57         ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 26+ messages in thread
From: Julian Brown @ 2018-08-15 16:46 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Tom de Vries, Chung-Lin Tang, jakub

[-- Attachment #1: Type: text/plain, Size: 2758 bytes --]

On Mon, 13 Aug 2018 12:06:21 -0700
Cesar Philippidis <cesar@codesourcery.com> wrote:

> So in other words, this is safe for fortran. It probably could use a
> fortran test, because that functionality wasn't explicitly exercised
> in og7/og8.

Here's a new version of the patch with a Fortran test case. It's not
too easy to write a test that depends on whether gang-local variables
actually end up in the right kind of memory, so I wrote one that scans
the omplower dump instead. Many other (including execution) tests will
already trigger the new behaviour.

Tested with offloading to NVPTX.

OK?

Thanks,

Julian

2018-08-10  Julian Brown  <julian@codesourcery.com>
            Chung-Lin Tang  <cltang@codesourcery.com>

        gcc/
        * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
        (gangprivate_shared_size): New global variable.
        (gangprivate_shared_align): Likewise.
        (gangprivate_shared_sym): Likewise.
        (gangprivate_shared_hmap): Likewise.
        (nvptx_option_override): Initialize gangprivate_shared_sym,
        gangprivate_shared_align.
        (nvptx_file_end): Output gangprivate_shared_sym.
        (nvptx_goacc_expand_accel_var): New function.
        (nvptx_set_current_function): New function.
        (TARGET_SET_CURRENT_FUNCTION): Define hook.
        (TARGET_GOACC_EXPAND_ACCEL): Likewise.
        * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
        * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
        * expr.c (expand_expr_real_1): Remap decls marked with the
        "oacc gangprivate" atttribute.
        * omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
        fields.
        (new_omp_context): Initialize oacc_decls in new omp_context.
        (delete_omp_context): Delete oacc_decls in old omp_context.
        (lower_oacc_head_tail): Record partitioning-level count in omp context.
        (oacc_record_private_var_clauses, oacc_record_vars_in_bind)
        (mark_oacc_gangprivate): New functions.
        (lower_omp_for): Call oacc_record_private_var_clauses with "for"
        clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
        (lower_omp_target): Call oacc_record_private_var_clauses with "target"
        clauses.
        Call mark_oacc_gangprivate for offloaded target regions.
        (lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
        * target.def (expand_accel_var): New hook.

        libgomp/
        * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
        * testsuite/libgomp.oacc-c/pr85465.c: New test.
        * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

[-- Attachment #2: gang-local-storage-in-shm-3.diff --]
[-- Type: text/x-patch, Size: 19042 bytes --]

commit b73428237720be8d5b6e793f8615204356336d30
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Aug 9 20:27:04 2018 -0700

    [OpenACC] Add support for gang local storage allocation in shared memory
    
    2018-08-10  Julian Brown  <julian@codesourcery.com>
    	    Chung-Lin Tang  <cltang@codesourcery.com>
    
    	gcc/
    	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
    	(gangprivate_shared_size): New global variable.
    	(gangprivate_shared_align): Likewise.
    	(gangprivate_shared_sym): Likewise.
    	(gangprivate_shared_hmap): Likewise.
    	(nvptx_option_override): Initialize gangprivate_shared_sym,
    	gangprivate_shared_align.
    	(nvptx_file_end): Output gangprivate_shared_sym.
    	(nvptx_goacc_expand_accel_var): New function.
    	(nvptx_set_current_function): New function.
    	(TARGET_SET_CURRENT_FUNCTION): Define hook.
    	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
    	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
    	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
    	* expr.c (expand_expr_real_1): Remap decls marked with the
    	"oacc gangprivate" atttribute.
    	* omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
    	fields.
    	(new_omp_context): Initialize oacc_decls in new omp_context.
    	(delete_omp_context): Delete oacc_decls in old omp_context.
    	(lower_oacc_head_tail): Record partitioning-level count in omp context.
    	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
    	(mark_oacc_gangprivate): New functions.
    	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
    	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
    	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
    	clauses.
    	Call mark_oacc_gangprivate for offloaded target regions.
    	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
    	* target.def (expand_accel_var): New hook.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
    	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
    	* testsuite/libgomp.oacc-c/pr85465.c: New test.
    	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c0b0a2e..14eb842 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -4968,6 +4979,10 @@ nvptx_file_end (void)
     write_worker_buffer (asm_out_file, worker_red_sym,
 			 worker_red_align, worker_red_size);
 
+  if (gangprivate_shared_size)
+    write_worker_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -5915,6 +5930,47 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (TREE_CODE (var) == VAR_DECL
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size =
+	    (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
+static GTY(()) tree nvptx_previous_fndecl;
+
+static void
+nvptx_set_current_function (tree fndecl)
+{
+  if (!fndecl || fndecl == nvptx_previous_fndecl)
+    return;
+
+  gangprivate_shared_hmap.empty ();
+  nvptx_previous_fndecl = fndecl;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -6051,6 +6107,12 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a40f45a..fb87f67 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6064,6 +6064,14 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 39a214e..beace61 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4151,6 +4151,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..2c62bf9 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9854,8 +9854,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 843c66f..b0e173d 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -124,6 +124,12 @@ struct omp_context
 
   /* True if this construct can be cancelled.  */
   bool cancellable;
+
+  /* The number of levels of OpenACC partitioning invoked in this context.  */
+  int oacc_partitioning_levels;
+
+  /* Decls in this context.  */
+  vec<tree> *oacc_decls;
 };
 
 static splay_tree all_contexts;
@@ -850,6 +856,7 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
     }
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
+  ctx->oacc_decls = new vec<tree> ();
 
   return ctx;
 }
@@ -925,6 +932,8 @@ delete_omp_context (splay_tree_value value)
   if (is_task_ctx (ctx))
     finalize_task_copyfn (as_a <gomp_task *> (ctx->stmt));
 
+  delete ctx->oacc_decls;
+
   XDELETE (ctx);
 }
 
@@ -5716,6 +5725,9 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
+
+  ctx->oacc_partitioning_levels = count;
+
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -6732,6 +6744,77 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  if (!ctx)
+    return;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    switch (OMP_CLAUSE_CODE (c))
+      {
+      case OMP_CLAUSE_PRIVATE:
+	{
+	  tree decl = OMP_CLAUSE_DECL (c);
+	  ctx->oacc_decls->safe_push (decl);
+	}
+	break;
+
+      default:
+	/* Empty.  */;
+      }
+}
+
+/* Record vars declared in BINDVARS in CTX.  This information is used to mark
+   up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  if (!ctx)
+    return;
+
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    ctx->oacc_decls->safe_push (v);
+}
+
+/* Mark variables which are declared implicitly or explicitly as gang private
+   with a special attribute.  These may need to have their declarations altered
+   later on in compilation (e.g. in execute_oacc_device_lower or the backend,
+   depending on how the OpenACC execution model is implemented on a given
+   target) to ensure that sharing semantics are correct.
+   Only variables which have their address taken need to be considered.  */
+
+static void
+mark_oacc_gangprivate (vec<tree> *decls)
+{
+  int i;
+  tree decl;
+
+  FOR_EACH_VEC_ELT (*decls, i, decl)
+    {
+      if (TREE_CODE (decl) == VAR_DECL
+	  && TREE_ADDRESSABLE (decl)
+	  && !lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file,
+		       "Setting 'oacc gangprivate' attribute for decl:");
+	      print_generic_decl (dump_file, decl, TDF_SLIM);
+	      fputc ('\n', dump_file);
+	    }
+	  DECL_ATTRIBUTES (decl)
+	    = tree_cons (get_identifier ("oacc gangprivate"),
+			 NULL, DECL_ATTRIBUTES (decl));
+	}
+    }
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -6748,6 +6831,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -6878,7 +6963,20 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   /* Add OpenACC partitioning and reduction markers just before the loop.  */
   if (oacc_head)
-    gimple_seq_add_seq (&body, oacc_head);
+    {
+      gimple_seq_add_seq (&body, oacc_head);
+
+      int level_total = 0;
+      omp_context *thisctx;
+
+      for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
+        level_total += thisctx->oacc_partitioning_levels;
+
+      /* If the current context and parent contexts are distributed over a
+	 total of one parallelism level, we have gang partitioning.  */
+      if (level_total == 1)
+        mark_oacc_gangprivate (ctx->oacc_decls);
+    }
 
   lower_omp_for_lastprivate (&fd, &body, &dlist, ctx);
 
@@ -7511,6 +7609,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
+  oacc_record_private_var_clauses (ctx, clauses);
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -7761,6 +7861,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      mark_oacc_gangprivate (ctx->oacc_decls);
+
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -8755,6 +8857,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
diff --git a/gcc/target.def b/gcc/target.def
index c570f38..b3b24b8 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1701,6 +1701,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 0000000..f378346
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 0000000..a4f81a3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/pr85465.c b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
new file mode 100644
index 0000000..329e8a0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int
+main (void)
+{
+#pragma acc parallel
+  foo ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
new file mode 100644
index 0000000..5f8a5e6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -0,0 +1,25 @@
+! Test for "oacc gangprivate" attribute on gang-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        !$acc atomic update
+        w = w + 1
+        !$acc end atomic
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-15 16:46       ` Julian Brown
@ 2018-08-15 19:57         ` Bernhard Reutner-Fischer
  2018-08-16 15:47           ` Julian Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Bernhard Reutner-Fischer @ 2018-08-15 19:57 UTC (permalink / raw)
  To: gcc-patches, Julian Brown, Cesar Philippidis
  Cc: gcc-patches, Tom de Vries, Chung-Lin Tang, jakub

On 15 August 2018 18:46:37 CEST, Julian Brown <julian@codesourcery.com> wrote:
>On Mon, 13 Aug 2018 12:06:21 -0700
>Cesar Philippidis <cesar@codesourcery.com> wrote:

atttribute has more t than strictly necessary. 
Don't like signed integer levels where they should be some unsigned. 
Also don't like single switch cases instead of if.
And omitting function comments even if the hook way above is documented may be ok ish but is a bit lazy ;)

thanks, 

>
>> So in other words, this is safe for fortran. It probably could use a
>> fortran test, because that functionality wasn't explicitly exercised
>> in og7/og8.
>
>Here's a new version of the patch with a Fortran test case. It's not
>too easy to write a test that depends on whether gang-local variables
>actually end up in the right kind of memory, so I wrote one that scans
>the omplower dump instead. Many other (including execution) tests will
>already trigger the new behaviour.
>
>Tested with offloading to NVPTX.
>
>OK?
>
>Thanks,
>
>Julian
>
>2018-08-10  Julian Brown  <julian@codesourcery.com>
>            Chung-Lin Tang  <cltang@codesourcery.com>
>
>        gcc/
>        * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
>        (gangprivate_shared_size): New global variable.
>        (gangprivate_shared_align): Likewise.
>        (gangprivate_shared_sym): Likewise.
>        (gangprivate_shared_hmap): Likewise.
>        (nvptx_option_override): Initialize gangprivate_shared_sym,
>        gangprivate_shared_align.
>        (nvptx_file_end): Output gangprivate_shared_sym.
>        (nvptx_goacc_expand_accel_var): New function.
>        (nvptx_set_current_function): New function.
>        (TARGET_SET_CURRENT_FUNCTION): Define hook.
>        (TARGET_GOACC_EXPAND_ACCEL): Likewise.
>      * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
>        * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
>        * expr.c (expand_expr_real_1): Remap decls marked with the
>        "oacc gangprivate" atttribute.
>  * omp-low.c (omp_context): Add oacc_partitioning_level and oacc_decls
>        fields.
>        (new_omp_context): Initialize oacc_decls in new omp_context.
>        (delete_omp_context): Delete oacc_decls in old omp_context.
>(lower_oacc_head_tail): Record partitioning-level count in omp context.
>        (oacc_record_private_var_clauses, oacc_record_vars_in_bind)
>        (mark_oacc_gangprivate): New functions.
>       (lower_omp_for): Call oacc_record_private_var_clauses with "for"
>       clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
> (lower_omp_target): Call oacc_record_private_var_clauses with "target"
>        clauses.
>        Call mark_oacc_gangprivate for offloaded target regions.
>   (lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
>        * target.def (expand_accel_var): New hook.
>
>        libgomp/
>      * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
>        * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
>        * testsuite/libgomp.oacc-c/pr85465.c: New test.
>   * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-15 19:57         ` Bernhard Reutner-Fischer
@ 2018-08-16 15:47           ` Julian Brown
  2018-08-17 16:39             ` Bernhard Reutner-Fischer
  2018-10-05 14:07             ` [PATCH, OpenACC] Add support for gang local storage allocation in shared memory Tom de Vries
  0 siblings, 2 replies; 26+ messages in thread
From: Julian Brown @ 2018-08-16 15:47 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: gcc-patches, Cesar Philippidis, Tom de Vries, Chung-Lin Tang, jakub

[-- Attachment #1: Type: text/plain, Size: 3010 bytes --]

On Wed, 15 Aug 2018 21:56:54 +0200
Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:

> On 15 August 2018 18:46:37 CEST, Julian Brown
> <julian@codesourcery.com> wrote:
> >On Mon, 13 Aug 2018 12:06:21 -0700
> >Cesar Philippidis <cesar@codesourcery.com> wrote:  
> 
> atttribute has more t than strictly necessary. 
> Don't like signed integer levels where they should be some unsigned. 
> Also don't like single switch cases instead of if.
> And omitting function comments even if the hook way above is
> documented may be ok ish but is a bit lazy ;)

Here's a new version with those comments addressed. I also changed the
logic around a little to avoid adding decls to the vec in omp_context
which would never be given the gang-private attribute.

Re-tested with offloading to NVPTX.

OK?

Julian

2018-08-10  Julian Brown  <julian@codesourcery.com>
            Chung-Lin Tang  <cltang@codesourcery.com>

        gcc/
        * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
        (gangprivate_shared_size): New global variable.
        (gangprivate_shared_align): Likewise.
        (gangprivate_shared_sym): Likewise.
        (gangprivate_shared_hmap): Likewise.
        (nvptx_option_override): Initialize gangprivate_shared_sym,
        gangprivate_shared_align.
        (nvptx_file_end): Output gangprivate_shared_sym.
        (nvptx_goacc_expand_accel_var): New function.
        (nvptx_set_current_function): New function.
        (TARGET_SET_CURRENT_FUNCTION): Define hook.
        (TARGET_GOACC_EXPAND_ACCEL): Likewise.
        * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
        * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
        * expr.c (expand_expr_real_1): Remap decls marked with the
        "oacc gangprivate" attribute.
        * omp-low.c (omp_context): Add oacc_partitioning_level and
        oacc_addressable_var_decls fields.
        (new_omp_context): Initialize oacc_addressable_var_decls in new
        omp_context.
        (delete_omp_context): Delete oacc_addressable_var_decls in old
        omp_context.
        (lower_oacc_head_tail): Record partitioning-level count in omp context.
        (oacc_record_private_var_clauses, oacc_record_vars_in_bind)
        (mark_oacc_gangprivate): New functions.
        (lower_omp_for): Call oacc_record_private_var_clauses with "for"
        clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
        (lower_omp_target): Call oacc_record_private_var_clauses with "target"
        clauses.
        Call mark_oacc_gangprivate for offloaded target regions.
        (lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
        * target.def (expand_accel_var): New hook.

        libgomp/
        * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
        * testsuite/libgomp.oacc-c/pr85465.c: New test.
        * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

[-- Attachment #2: gang-local-storage-in-shm-5.diff --]
[-- Type: text/x-patch, Size: 19358 bytes --]

commit e276442550a85b62866ba13890eacf4e946d1079
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Aug 9 20:27:04 2018 -0700

    [OpenACC] Add support for gang local storage allocation in shared memory
    
    2018-08-10  Julian Brown  <julian@codesourcery.com>
    	    Chung-Lin Tang  <cltang@codesourcery.com>
    
    	gcc/
    	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
    	(gangprivate_shared_size): New global variable.
    	(gangprivate_shared_align): Likewise.
    	(gangprivate_shared_sym): Likewise.
    	(gangprivate_shared_hmap): Likewise.
    	(nvptx_option_override): Initialize gangprivate_shared_sym,
    	gangprivate_shared_align.
    	(nvptx_file_end): Output gangprivate_shared_sym.
    	(nvptx_goacc_expand_accel_var): New function.
    	(nvptx_set_current_function): New function.
    	(TARGET_SET_CURRENT_FUNCTION): Define hook.
    	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
    	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
    	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
    	* expr.c (expand_expr_real_1): Remap decls marked with the
    	"oacc gangprivate" attribute.
    	* omp-low.c (omp_context): Add oacc_partitioning_level and
    	oacc_addressable_var_decls fields.
    	(new_omp_context): Initialize oacc_addressable_var_decls in new
    	omp_context.
    	(delete_omp_context): Delete oacc_addressable_var_decls in old
    	omp_context.
    	(lower_oacc_head_tail): Record partitioning-level count in omp context.
    	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
    	(mark_oacc_gangprivate): New functions.
    	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
    	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
    	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
    	clauses.
    	Call mark_oacc_gangprivate for offloaded target regions.
    	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
    	* target.def (expand_accel_var): New hook.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
    	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
    	* testsuite/libgomp.oacc-c/pr85465.c: New test.
    	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c0b0a2e..7aeefdb 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -4968,6 +4979,10 @@ nvptx_file_end (void)
     write_worker_buffer (asm_out_file, worker_red_sym,
 			 worker_red_align, worker_red_size);
 
+  if (gangprivate_shared_size)
+    write_worker_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -5915,6 +5930,52 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+/* Implement TARGET_GOACC_EXPAND_ACCEL_VAR.  Place "oacc gangprivate"
+   variables in shared memory.  */
+
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (TREE_CODE (var) == VAR_DECL
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size =
+	    (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
+static GTY(()) tree nvptx_previous_fndecl;
+
+/* Implement TARGET_SET_CURRENT_FUNCTION.  Reset per-function context.  */
+
+static void
+nvptx_set_current_function (tree fndecl)
+{
+  if (!fndecl || fndecl == nvptx_previous_fndecl)
+    return;
+
+  gangprivate_shared_hmap.empty ();
+  nvptx_previous_fndecl = fndecl;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -6051,6 +6112,12 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a40f45a..fb87f67 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6064,6 +6064,14 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 39a214e..beace61 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4151,6 +4151,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..f186a41 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9854,8 +9854,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may be intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 843c66f..a649d2e 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -124,6 +124,12 @@ struct omp_context
 
   /* True if this construct can be cancelled.  */
   bool cancellable;
+
+  /* The number of levels of OpenACC partitioning invoked in this context.  */
+  unsigned oacc_partitioning_levels;
+
+  /* Addressable variable decls in this context.  */
+  vec<tree> *oacc_addressable_var_decls;
 };
 
 static splay_tree all_contexts;
@@ -850,6 +856,7 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
     }
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
+  ctx->oacc_addressable_var_decls = new vec<tree> ();
 
   return ctx;
 }
@@ -925,6 +932,8 @@ delete_omp_context (splay_tree_value value)
   if (is_task_ctx (ctx))
     finalize_task_copyfn (as_a <gomp_task *> (ctx->stmt));
 
+  delete ctx->oacc_addressable_var_decls;
+
   XDELETE (ctx);
 }
 
@@ -5716,6 +5725,9 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
+
+  ctx->oacc_partitioning_levels = count;
+
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -6732,6 +6744,68 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  if (!ctx)
+    return;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+	tree decl = OMP_CLAUSE_DECL (c);
+	if (TREE_CODE (decl) == VAR_DECL && TREE_ADDRESSABLE (decl))
+	  ctx->oacc_addressable_var_decls->safe_push (decl);
+      }
+}
+
+/* Record addressable vars declared in BINDVARS in CTX.  This information is
+   used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  if (!ctx)
+    return;
+
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    if (TREE_CODE (v) == VAR_DECL && TREE_ADDRESSABLE (v))
+      ctx->oacc_addressable_var_decls->safe_push (v);
+}
+
+/* Mark addressable variables which are declared implicitly or explicitly as
+   gang private with a special attribute.  These may need to have their
+   declarations altered later on in compilation (e.g. in
+   execute_oacc_device_lower or the backend, depending on how the OpenACC
+   execution model is implemented on a given target) to ensure that sharing
+   semantics are correct.  */
+
+static void
+mark_oacc_gangprivate (vec<tree> *decls)
+{
+  int i;
+  tree decl;
+
+  FOR_EACH_VEC_ELT (*decls, i, decl)
+    if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
+      {
+	if (dump_file && (dump_flags & TDF_DETAILS))
+	  {
+	    fprintf (dump_file,
+		     "Setting 'oacc gangprivate' attribute for decl:");
+	    print_generic_decl (dump_file, decl, TDF_SLIM);
+	    fputc ('\n', dump_file);
+	  }
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("oacc gangprivate"),
+		       NULL, DECL_ATTRIBUTES (decl));
+      }
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -6748,6 +6822,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -6878,7 +6954,20 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   /* Add OpenACC partitioning and reduction markers just before the loop.  */
   if (oacc_head)
-    gimple_seq_add_seq (&body, oacc_head);
+    {
+      gimple_seq_add_seq (&body, oacc_head);
+
+      unsigned level_total = 0;
+      omp_context *thisctx;
+
+      for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
+        level_total += thisctx->oacc_partitioning_levels;
+
+      /* If the current context and parent contexts are distributed over a
+	 total of one parallelism level, we have gang partitioning.  */
+      if (level_total == 1)
+        mark_oacc_gangprivate (ctx->oacc_addressable_var_decls);
+    }
 
   lower_omp_for_lastprivate (&fd, &body, &dlist, ctx);
 
@@ -7511,6 +7600,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
+  oacc_record_private_var_clauses (ctx, clauses);
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -7761,6 +7852,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      mark_oacc_gangprivate (ctx->oacc_addressable_var_decls);
+
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -8755,6 +8848,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
diff --git a/gcc/target.def b/gcc/target.def
index c570f38..b3b24b8 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1701,6 +1701,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 0000000..f378346
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 0000000..a4f81a3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/pr85465.c b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
new file mode 100644
index 0000000..329e8a0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int
+main (void)
+{
+#pragma acc parallel
+  foo ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
new file mode 100644
index 0000000..5f8a5e6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -0,0 +1,25 @@
+! Test for "oacc gangprivate" attribute on gang-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        !$acc atomic update
+        w = w + 1
+        !$acc end atomic
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-16 15:47           ` Julian Brown
@ 2018-08-17 16:39             ` Bernhard Reutner-Fischer
  2018-12-11 15:08               ` Julian Brown
  2018-10-05 14:07             ` [PATCH, OpenACC] Add support for gang local storage allocation in shared memory Tom de Vries
  1 sibling, 1 reply; 26+ messages in thread
From: Bernhard Reutner-Fischer @ 2018-08-17 16:39 UTC (permalink / raw)
  To: Julian Brown
  Cc: gcc-patches, Cesar Philippidis, Tom de Vries, Chung-Lin Tang, jakub

On 16 August 2018 17:46:43 CEST, Julian Brown <julian@codesourcery.com> wrote:
>On Wed, 15 Aug 2018 21:56:54 +0200
>Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
>
>> On 15 August 2018 18:46:37 CEST, Julian Brown
>> <julian@codesourcery.com> wrote:
>> >On Mon, 13 Aug 2018 12:06:21 -0700
>> >Cesar Philippidis <cesar@codesourcery.com> wrote:  
>> 
>> atttribute has more t than strictly necessary. 
>> Don't like signed integer levels where they should be some unsigned. 
>> Also don't like single switch cases instead of if.
>> And omitting function comments even if the hook way above is
>> documented may be ok ish but is a bit lazy ;)
>
>Here's a new version with those comments addressed. I also changed the
>logic around a little to avoid adding decls to the vec in omp_context
>which would never be given the gang-private attribute.
>
>Re-tested with offloading to NVPTX.
>
>OK?

(TREE_CODE (var) == VAR_DECL
Is nowadays known as VAR_P (decl), FWIW.

ISTM that global variables are not JIT-friendly.
No further comments from me.

Thanks,

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-16 15:47           ` Julian Brown
  2018-08-17 16:39             ` Bernhard Reutner-Fischer
@ 2018-10-05 14:07             ` Tom de Vries
  1 sibling, 0 replies; 26+ messages in thread
From: Tom de Vries @ 2018-10-05 14:07 UTC (permalink / raw)
  To: Julian Brown
  Cc: Bernhard Reutner-Fischer, gcc-patches, Cesar Philippidis,
	Chung-Lin Tang, jakub

On 8/16/18 5:46 PM, Julian Brown wrote:
> On Wed, 15 Aug 2018 21:56:54 +0200
> Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
> 
>> On 15 August 2018 18:46:37 CEST, Julian Brown
>> <julian@codesourcery.com> wrote:
>>> On Mon, 13 Aug 2018 12:06:21 -0700
>>> Cesar Philippidis <cesar@codesourcery.com> wrote:  
>>
>> atttribute has more t than strictly necessary. 
>> Don't like signed integer levels where they should be some unsigned. 
>> Also don't like single switch cases instead of if.
>> And omitting function comments even if the hook way above is
>> documented may be ok ish but is a bit lazy ;)
> 
> Here's a new version with those comments addressed. I also changed the
> logic around a little to avoid adding decls to the vec in omp_context
> which would never be given the gang-private attribute.
> 
> Re-tested with offloading to NVPTX.
> 
> OK?

As far as the nvptx part is concerned, I see:
...
=== ERROR type #4: trailing operator (1 error(s)) ===
gcc/config/nvptx/nvptx.c:5946:27:         gangprivate_shared_size =
...

Otherwise, the nvptx part is OK.

Thanks,
- Tom

> 
> Julian
> 
> 2018-08-10  Julian Brown  <julian@codesourcery.com>
>             Chung-Lin Tang  <cltang@codesourcery.com>
> 
>         gcc/
>         * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
>         (gangprivate_shared_size): New global variable.
>         (gangprivate_shared_align): Likewise.
>         (gangprivate_shared_sym): Likewise.
>         (gangprivate_shared_hmap): Likewise.
>         (nvptx_option_override): Initialize gangprivate_shared_sym,
>         gangprivate_shared_align.
>         (nvptx_file_end): Output gangprivate_shared_sym.
>         (nvptx_goacc_expand_accel_var): New function.
>         (nvptx_set_current_function): New function.
>         (TARGET_SET_CURRENT_FUNCTION): Define hook.
>         (TARGET_GOACC_EXPAND_ACCEL): Likewise.
>         * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
>         * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
>         * expr.c (expand_expr_real_1): Remap decls marked with the
>         "oacc gangprivate" attribute.
>         * omp-low.c (omp_context): Add oacc_partitioning_level and
>         oacc_addressable_var_decls fields.
>         (new_omp_context): Initialize oacc_addressable_var_decls in new
>         omp_context.
>         (delete_omp_context): Delete oacc_addressable_var_decls in old
>         omp_context.
>         (lower_oacc_head_tail): Record partitioning-level count in omp context.
>         (oacc_record_private_var_clauses, oacc_record_vars_in_bind)
>         (mark_oacc_gangprivate): New functions.
>         (lower_omp_for): Call oacc_record_private_var_clauses with "for"
>         clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
>         (lower_omp_target): Call oacc_record_private_var_clauses with "target"
>         clauses.
>         Call mark_oacc_gangprivate for offloaded target regions.
>         (lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
>         * target.def (expand_accel_var): New hook.
> 
>         libgomp/
>         * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
>         * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
>         * testsuite/libgomp.oacc-c/pr85465.c: New test.
>         * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-08-17 16:39             ` Bernhard Reutner-Fischer
@ 2018-12-11 15:08               ` Julian Brown
  2019-06-03 16:03                 ` Julian Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Julian Brown @ 2018-12-11 15:08 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: gcc-patches, Cesar Philippidis, Tom de Vries, Chung-Lin Tang, jakub

[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]

On Fri, 17 Aug 2018 18:39:00 +0200
Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:

> On 16 August 2018 17:46:43 CEST, Julian Brown
> <julian@codesourcery.com> wrote:
> >On Wed, 15 Aug 2018 21:56:54 +0200
> >Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
> >  
> >> On 15 August 2018 18:46:37 CEST, Julian Brown
> >> <julian@codesourcery.com> wrote:  
> >> >On Mon, 13 Aug 2018 12:06:21 -0700
> >> >Cesar Philippidis <cesar@codesourcery.com> wrote:    
> >> 
> >> atttribute has more t than strictly necessary. 
> >> Don't like signed integer levels where they should be some
> >> unsigned. Also don't like single switch cases instead of if.
> >> And omitting function comments even if the hook way above is
> >> documented may be ok ish but is a bit lazy ;)  
> >
> >Here's a new version with those comments addressed. I also changed
> >the logic around a little to avoid adding decls to the vec in
> >omp_context which would never be given the gang-private attribute.
> >
> >Re-tested with offloading to NVPTX.
> >
> >OK?  
> 
> (TREE_CODE (var) == VAR_DECL
> Is nowadays known as VAR_P (decl), FWIW.

Fixed. (And also Tom's formatting nit mentioned in another email.)

> ISTM that global variables are not JIT-friendly.
> No further comments from me.

Probably true, but AFAIK nobody's trying to use the (GCC) JIT with the
PTX backend, and the backend already uses global variables for several
other purposes. Of course PTX code is JIT'ted itself by the NVidia
runtime, but I guess that's not what you were referring to!

Is this version OK? Re-tested with offloading to NVPTX.

Thanks,

Julian

[-- Attachment #2: gang-local-storage-in-shm-6.diff --]
[-- Type: text/x-patch, Size: 19272 bytes --]

commit 3335ddfa72944be5359280116e8eb4febd4ed3c7
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Aug 9 20:27:04 2018 -0700

    [OpenACC] Add support for gang local storage allocation in shared memory
    
    2018-08-10  Julian Brown  <julian@codesourcery.com>
    	    Chung-Lin Tang  <cltang@codesourcery.com>
    
    	gcc/
    	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
    	(gangprivate_shared_size): New global variable.
    	(gangprivate_shared_align): Likewise.
    	(gangprivate_shared_sym): Likewise.
    	(gangprivate_shared_hmap): Likewise.
    	(nvptx_option_override): Initialize gangprivate_shared_sym,
    	gangprivate_shared_align.
    	(nvptx_file_end): Output gangprivate_shared_sym.
    	(nvptx_goacc_expand_accel_var): New function.
    	(nvptx_set_current_function): New function.
    	(TARGET_SET_CURRENT_FUNCTION): Define hook.
    	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
    	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
    	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
    	* expr.c (expand_expr_real_1): Remap decls marked with the
    	"oacc gangprivate" attribute.
    	* omp-low.c (omp_context): Add oacc_partitioning_level and
    	oacc_addressable_var_decls fields.
    	(new_omp_context): Initialize oacc_addressable_var_decls in new
    	omp_context.
    	(delete_omp_context): Delete oacc_addressable_var_decls in old
    	omp_context.
    	(lower_oacc_head_tail): Record partitioning-level count in omp context.
    	(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
    	(mark_oacc_gangprivate): New functions.
    	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
    	clauses.  Call mark_oacc_gangprivate for gang-partitioned loops.
    	(lower_omp_target): Call oacc_record_private_var_clauses with "target"
    	clauses.
    	Call mark_oacc_gangprivate for offloaded target regions.
    	(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
    	* target.def (expand_accel_var): New hook.
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
    	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
    	* testsuite/libgomp.oacc-c/pr85465.c: New test.
    	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9903a27..02c2847 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -73,6 +73,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -137,6 +138,12 @@ static unsigned worker_red_size;
 static unsigned worker_red_align;
 static GTY(()) rtx worker_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -210,6 +217,10 @@ nvptx_option_override (void)
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -4971,6 +4982,10 @@ nvptx_file_end (void)
     write_worker_buffer (asm_out_file, worker_red_sym,
 			 worker_red_align, worker_red_size);
 
+  if (gangprivate_shared_size)
+    write_worker_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -5918,6 +5933,52 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+/* Implement TARGET_GOACC_EXPAND_ACCEL_VAR.  Place "oacc gangprivate"
+   variables in shared memory.  */
+
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (VAR_P (var)
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size
+	    = (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
+static GTY(()) tree nvptx_previous_fndecl;
+
+/* Implement TARGET_SET_CURRENT_FUNCTION.  Reset per-function context.  */
+
+static void
+nvptx_set_current_function (tree fndecl)
+{
+  if (!fndecl || fndecl == nvptx_previous_fndecl)
+    return;
+
+  gangprivate_shared_hmap.empty ();
+  nvptx_previous_fndecl = fndecl;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -6054,6 +6115,12 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index e348f0a..9164917 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6124,6 +6124,14 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f1ad80d..3cdaca2 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4202,6 +4202,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index 85b7847..0f73deb 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9874,8 +9874,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may be intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b406ce7..f078110 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -133,6 +133,12 @@ struct omp_context
 
   /* True if this construct can be cancelled.  */
   bool cancellable;
+
+  /* The number of levels of OpenACC partitioning invoked in this context.  */
+  unsigned oacc_partitioning_levels;
+
+  /* Addressable variable decls in this context.  */
+  vec<tree> *oacc_addressable_var_decls;
 };
 
 static splay_tree all_contexts;
@@ -872,6 +878,7 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
     }
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
+  ctx->oacc_addressable_var_decls = new vec<tree> ();
 
   return ctx;
 }
@@ -953,6 +960,8 @@ delete_omp_context (splay_tree_value value)
       delete ctx->task_reduction_map;
     }
 
+  delete ctx->oacc_addressable_var_decls;
+
   XDELETE (ctx);
 }
 
@@ -6470,6 +6479,9 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
+
+  ctx->oacc_partitioning_levels = count;
+
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -8144,6 +8156,68 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  if (!ctx)
+    return;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+	tree decl = OMP_CLAUSE_DECL (c);
+	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
+	  ctx->oacc_addressable_var_decls->safe_push (decl);
+      }
+}
+
+/* Record addressable vars declared in BINDVARS in CTX.  This information is
+   used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  if (!ctx)
+    return;
+
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    if (VAR_P (v) && TREE_ADDRESSABLE (v))
+      ctx->oacc_addressable_var_decls->safe_push (v);
+}
+
+/* Mark addressable variables which are declared implicitly or explicitly as
+   gang private with a special attribute.  These may need to have their
+   declarations altered later on in compilation (e.g. in
+   execute_oacc_device_lower or the backend, depending on how the OpenACC
+   execution model is implemented on a given target) to ensure that sharing
+   semantics are correct.  */
+
+static void
+mark_oacc_gangprivate (vec<tree> *decls)
+{
+  int i;
+  tree decl;
+
+  FOR_EACH_VEC_ELT (*decls, i, decl)
+    if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
+      {
+	if (dump_file && (dump_flags & TDF_DETAILS))
+	  {
+	    fprintf (dump_file,
+		     "Setting 'oacc gangprivate' attribute for decl:");
+	    print_generic_decl (dump_file, decl, TDF_SLIM);
+	    fputc ('\n', dump_file);
+	  }
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("oacc gangprivate"),
+		       NULL, DECL_ATTRIBUTES (decl));
+      }
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -8161,6 +8235,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -8316,7 +8392,20 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   /* Add OpenACC partitioning and reduction markers just before the loop.  */
   if (oacc_head)
-    gimple_seq_add_seq (&body, oacc_head);
+    {
+      gimple_seq_add_seq (&body, oacc_head);
+
+      unsigned level_total = 0;
+      omp_context *thisctx;
+
+      for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
+        level_total += thisctx->oacc_partitioning_levels;
+
+      /* If the current context and parent contexts are distributed over a
+	 total of one parallelism level, we have gang partitioning.  */
+      if (level_total == 1)
+        mark_oacc_gangprivate (ctx->oacc_addressable_var_decls);
+    }
 
   lower_omp_for_lastprivate (&fd, &body, &dlist, ctx);
 
@@ -9092,6 +9181,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
+  oacc_record_private_var_clauses (ctx, clauses);
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -9342,6 +9433,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      mark_oacc_gangprivate (ctx->oacc_addressable_var_decls);
+
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -10336,6 +10429,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
diff --git a/gcc/target.def b/gcc/target.def
index 96f37e0..e154b17 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1707,6 +1707,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 0000000..f378346
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 0000000..a4f81a3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/pr85465.c b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
new file mode 100644
index 0000000..329e8a0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int
+main (void)
+{
+#pragma acc parallel
+  foo ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
new file mode 100644
index 0000000..5f8a5e6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -0,0 +1,25 @@
+! Test for "oacc gangprivate" attribute on gang-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        !$acc atomic update
+        w = w + 1
+        !$acc end atomic
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2018-12-11 15:08               ` Julian Brown
@ 2019-06-03 16:03                 ` Julian Brown
  2019-06-03 16:23                   ` Jakub Jelinek
  2022-02-14 15:56                   ` Thomas Schwinge
  0 siblings, 2 replies; 26+ messages in thread
From: Julian Brown @ 2019-06-03 16:03 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer; +Cc: gcc-patches, Tom de Vries, Chung-Lin Tang, jakub

[-- Attachment #1: Type: text/plain, Size: 3290 bytes --]

On Tue, 11 Dec 2018 15:08:11 +0000
Julian Brown <julian@codesourcery.com> wrote:

> Is this version OK? Re-tested with offloading to NVPTX.

This is a ping for the patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00749.html

This is a new version of the patch, rebased and with a couple of
additional bugfixes, as follows:

Firstly, in mark_oacc_gangprivate, each decl is looked up (using
maybe_lookup_decl) to apply the "oacc gangprivate" attribute to the
innermost-nested copy of the decl.

Secondly, I'd misunderstood when the maximum parallelism level was
calculated for each nested omp_context, meaning that the code to
trigger adding the "oacc gangprivate" attribute could trigger in the
wrong circumstances. I've fixed this by moving the attribute-setting to
execute_lower_omp.

I've also added a new testcase (gangprivate-attrib-2.f90). Re-tested
with offloading to nvptx.

OK for trunk?

Thank you,

Julian

2019-06-03  Julian Brown  <julian@codesourcery.com>
            Chung-Lin Tang  <cltang@codesourcery.com>

        gcc/
        * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
        (gangprivate_shared_size): New global variable.
        (gangprivate_shared_align): Likewise.
        (gangprivate_shared_sym): Likewise.
        (gangprivate_shared_hmap): Likewise.
        (nvptx_option_override): Initialize gangprivate_shared_sym,
        gangprivate_shared_align.
        (nvptx_file_end): Output gangprivate_shared_sym.
        (nvptx_goacc_expand_accel_var): New function.
        (nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
        function comment.
        (TARGET_GOACC_EXPAND_ACCEL): Likewise.
        * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
        * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
        * expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
        "oacc gangprivate" attribute.
        * omp-low.c (omp_context): Add oacc_partitioning_level and
        oacc_addressable_var_decls fields.
        (new_omp_context): Initialize oacc_addressable_var_decls in new
        omp_context.
        (delete_omp_context): Delete oacc_addressable_var_decls in old
        omp_context.
        (lower_oacc_head_tail): Record partitioning-level count in omp context.
        (oacc_record_private_var_clauses, oacc_record_vars_in_bind,
        mark_oacc_gangprivate): New functions.
        (lower_omp_for): Call oacc_record_private_var_clauses with "for"
        clauses.
        (lower_omp_target): Likewise, for "target" clauses.
        Call mark_oacc_gangprivate for offloaded target regions.
        (process_oacc_gangprivate_1): New function.
        (lower_omp_1): Call oacc_record_vars_in_bind for GIMPLE_BIND within OMP
        regions.
        (execute_lower_omp): Call process_oacc_gangprivate_1 for each OMP
        context.
        * target.def (expand_accel_var): New hook.

        libgomp/
        * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
        * testsuite/libgomp.oacc-c/pr85465.c: New test.
        * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
        * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.

[-- Attachment #2: 0001-OpenACC-Add-support-for-gang-local-storage-allocatio.patch --]
[-- Type: text/x-patch, Size: 21812 bytes --]

From 917189cd07fcb68ba289c5fbcd768b7d4dff785f Mon Sep 17 00:00:00 2001
From: Julian Brown <julian@codesourcery.com>
Date: Thu, 9 Aug 2018 20:27:04 -0700
Subject: [PATCH] [OpenACC] Add support for gang local storage allocation in
 shared memory

2019-06-03  Julian Brown  <julian@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
	function comment.
	(TARGET_GOACC_EXPAND_ACCEL): Likewise.
	* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
	* expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
	"oacc gangprivate" attribute.
	* omp-low.c (omp_context): Add oacc_partitioning_level and
	oacc_addressable_var_decls fields.
	(new_omp_context): Initialize oacc_addressable_var_decls in new
	omp_context.
	(delete_omp_context): Delete oacc_addressable_var_decls in old
	omp_context.
	(lower_oacc_head_tail): Record partitioning-level count in omp context.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind,
	mark_oacc_gangprivate): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.
	(lower_omp_target): Likewise, for "target" clauses.
	Call mark_oacc_gangprivate for offloaded target regions.
	(process_oacc_gangprivate_1): New function.
	(lower_omp_1): Call oacc_record_vars_in_bind for GIMPLE_BIND within OMP
	regions.
	(execute_lower_omp): Call process_oacc_gangprivate_1 for each OMP
	context.
	* target.def (expand_accel_var): New hook.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	* testsuite/libgomp.oacc-c/pr85465.c: New test.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.
---
 gcc/config/nvptx/nvptx.c                      |  53 +++++++++
 gcc/doc/tm.texi                               |   8 ++
 gcc/doc/tm.texi.in                            |   2 +
 gcc/expr.c                                    |  13 +-
 gcc/omp-low.c                                 | 111 ++++++++++++++++++
 gcc/target.def                                |  10 ++
 .../gang-private-1.c                          |  38 ++++++
 .../libgomp.oacc-c-c++-common/loop-gwv-2.c    |  95 +++++++++++++++
 libgomp/testsuite/libgomp.oacc-c/pr85465.c    |  11 ++
 .../gangprivate-attrib-1.f90                  |  25 ++++
 .../gangprivate-attrib-2.f90                  |  23 ++++
 11 files changed, 388 insertions(+), 1 deletion(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/pr85465.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a28099ac89d..c93fb926609 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -74,6 +74,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -166,6 +167,12 @@ static unsigned vector_red_align;
 static unsigned vector_red_partition;
 static GTY(()) rtx vector_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -247,6 +254,10 @@ nvptx_option_override (void)
   vector_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
   vector_red_partition = 0;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -5237,6 +5248,10 @@ nvptx_file_end (void)
     write_shared_buffer (asm_out_file, vector_red_sym,
 			 vector_red_align, vector_red_size);
 
+  if (gangprivate_shared_size)
+    write_shared_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -6430,14 +6445,49 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+/* Implement TARGET_GOACC_EXPAND_ACCEL_VAR.  Place "oacc gangprivate"
+   variables in shared memory.  */
+
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (VAR_P (var)
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size
+	    = (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
 static GTY(()) tree nvptx_previous_fndecl;
 
+/* Implement TARGET_SET_CURRENT_FUNCTION.  Reset per-function context.  */
+
 static void
 nvptx_set_current_function (tree fndecl)
 {
   if (!fndecl || fndecl == nvptx_previous_fndecl)
     return;
 
+  gangprivate_shared_hmap.empty ();
   nvptx_previous_fndecl = fndecl;
   vector_red_partition = 0;
   oacc_bcast_partition = 0;
@@ -6579,6 +6629,9 @@ nvptx_set_current_function (tree fndecl)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 622e8cf240f..61da9709268 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6161,6 +6161,14 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 17560fce6b7..5579623e331 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4210,6 +4210,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index c78bc74c0d9..34510aab55d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may be intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index cfc237cd895..d0ed5c2255c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -137,6 +137,12 @@ struct omp_context
 
   /* True if this construct can be cancelled.  */
   bool cancellable;
+
+  /* The number of levels of OpenACC partitioning invoked in this context.  */
+  unsigned oacc_partitioning_levels;
+
+  /* Addressable variable decls in this context.  */
+  vec<tree> *oacc_addressable_var_decls;
 };
 
 static splay_tree all_contexts;
@@ -878,6 +884,7 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
     }
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
+  ctx->oacc_addressable_var_decls = new vec<tree> ();
 
   return ctx;
 }
@@ -960,6 +967,7 @@ delete_omp_context (splay_tree_value value)
     }
 
   delete ctx->lastprivate_conditional_map;
+  delete ctx->oacc_addressable_var_decls;
 
   XDELETE (ctx);
 }
@@ -6757,6 +6765,9 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
+
+  ctx->oacc_partitioning_levels = count;
+
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -8458,6 +8469,79 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  tree c;
+
+  if (!ctx)
+    return;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+	tree decl = OMP_CLAUSE_DECL (c);
+	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
+	  ctx->oacc_addressable_var_decls->safe_push (decl);
+      }
+}
+
+/* Record addressable vars declared in BINDVARS in CTX.  This information is
+   used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  if (!ctx)
+    return;
+
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    if (VAR_P (v) && TREE_ADDRESSABLE (v))
+      ctx->oacc_addressable_var_decls->safe_push (v);
+}
+
+/* Mark addressable variables which are declared implicitly or explicitly as
+   gang private with a special attribute.  These may need to have their
+   declarations altered later on in compilation (e.g. in
+   execute_oacc_device_lower or the backend, depending on how the OpenACC
+   execution model is implemented on a given target) to ensure that sharing
+   semantics are correct.  */
+
+static void
+mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
+{
+  int i;
+  tree decl;
+
+  FOR_EACH_VEC_ELT (*decls, i, decl)
+    {
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+	{
+	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
+	  if (inner_decl)
+	    {
+	      decl = inner_decl;
+	      break;
+	    }
+	}
+      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file,
+		       "Setting 'oacc gangprivate' attribute for decl:");
+	      print_generic_decl (dump_file, decl, TDF_SLIM);
+	      fputc ('\n', dump_file);
+	    }
+	  DECL_ATTRIBUTES (decl)
+	    = tree_cons (get_identifier ("oacc gangprivate"),
+			 NULL, DECL_ATTRIBUTES (decl));
+	}
+    }
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -8475,6 +8559,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -9420,6 +9506,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
+  oacc_record_private_var_clauses (ctx, clauses);
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -9670,6 +9758,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      mark_oacc_gangprivate (ctx->oacc_addressable_var_decls, ctx);
+
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -10521,6 +10611,25 @@ lower_omp_grid_body (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		       gimple_build_omp_return (false));
 }
 
+/* Find gang-private variables in a context.  */
+
+static int
+process_oacc_gangprivate_1 (splay_tree_node node, void * ARG_UNUSED (data))
+{
+  omp_context *ctx = (omp_context *) node->value;
+  unsigned level_total = 0;
+  omp_context *thisctx;
+
+  for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
+    level_total += thisctx->oacc_partitioning_levels;
+
+  /* If the current context and parent contexts are distributed over a
+     total of one parallelism level, we have gang partitioning.  */
+  if (level_total == 1)
+    mark_oacc_gangprivate (ctx->oacc_addressable_var_decls, ctx);
+
+  return 0;
+}
 
 /* Callback for lower_omp_1.  Return non-NULL if *tp needs to be
    regimplified.  If DATA is non-NULL, lower_omp_1 is outside
@@ -10665,6 +10774,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
@@ -10905,6 +11015,7 @@ execute_lower_omp (void)
 
   if (all_contexts)
     {
+      splay_tree_foreach (all_contexts, process_oacc_gangprivate_1, NULL);
       splay_tree_delete (all_contexts);
       all_contexts = NULL;
     }
diff --git a/gcc/target.def b/gcc/target.def
index 7d52102c815..5334c206afa 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1719,6 +1719,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 00000000000..f378346ed0a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 00000000000..a4f81a39e24
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/pr85465.c b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
new file mode 100644
index 00000000000..329e8a09cf9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int
+main (void)
+{
+#pragma acc parallel
+  foo ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
new file mode 100644
index 00000000000..5f8a5e650ea
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -0,0 +1,25 @@
+! Test for "oacc gangprivate" attribute on gang-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        !$acc atomic update
+        w = w + 1
+        !$acc end atomic
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
new file mode 100644
index 00000000000..d147229d91e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
@@ -0,0 +1,23 @@
+! Test for lack of "oacc gangprivate" attribute on worker-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl" 0 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang worker private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        w = w + 1
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-03 16:03                 ` Julian Brown
@ 2019-06-03 16:23                   ` Jakub Jelinek
  2019-06-07 14:08                     ` Julian Brown
  2022-02-14 15:56                   ` Thomas Schwinge
  1 sibling, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2019-06-03 16:23 UTC (permalink / raw)
  To: Julian Brown
  Cc: Bernhard Reutner-Fischer, gcc-patches, Tom de Vries, Chung-Lin Tang

On Mon, Jun 03, 2019 at 05:02:45PM +0100, Julian Brown wrote:
>         * omp-low.c (omp_context): Add oacc_partitioning_level and
>         oacc_addressable_var_decls fields.
>         (new_omp_context): Initialize oacc_addressable_var_decls in new
>         omp_context.
>         (delete_omp_context): Delete oacc_addressable_var_decls in old
>         omp_context.
>         (lower_oacc_head_tail): Record partitioning-level count in omp context.
>         (oacc_record_private_var_clauses, oacc_record_vars_in_bind,
>         mark_oacc_gangprivate): New functions.
>         (lower_omp_for): Call oacc_record_private_var_clauses with "for"
>         clauses.
>         (lower_omp_target): Likewise, for "target" clauses.
>         Call mark_oacc_gangprivate for offloaded target regions.
>         (process_oacc_gangprivate_1): New function.
>         (lower_omp_1): Call oacc_record_vars_in_bind for GIMPLE_BIND within OMP
>         regions.
>         (execute_lower_omp): Call process_oacc_gangprivate_1 for each OMP
>         context.

Just commenting on the above part:

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -137,6 +137,12 @@ struct omp_context
>  
>    /* True if this construct can be cancelled.  */
>    bool cancellable;
> +
> +  /* The number of levels of OpenACC partitioning invoked in this context.  */
> +  unsigned oacc_partitioning_levels;
> +
> +  /* Addressable variable decls in this context.  */
> +  vec<tree> *oacc_addressable_var_decls;

Why vec<tree> * rather than vec<tree>?

> @@ -878,6 +884,7 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>      }
>  
>    ctx->cb.decl_map = new hash_map<tree, tree>;
> +  ctx->oacc_addressable_var_decls = new vec<tree> ();

You then don't have to new it here and delete below.  As the context is
cleared with XCNEW, you don't need to do anything here, and just
release when deleting.  Note, even if using a pointer for some reason was
needed (not in this case), using unconditional new for something only used
for small subset of contexts is unacceptable, it would be then desirable to
only create when needed.

>  
>    return ctx;
>  }
> @@ -960,6 +967,7 @@ delete_omp_context (splay_tree_value value)
>      }
>  
>    delete ctx->lastprivate_conditional_map;
> +  delete ctx->oacc_addressable_var_decls;
>  
>    XDELETE (ctx);
>  }
> @@ -8458,6 +8469,79 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
>      }
>  }
>  
> +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
> +   is used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> +{
> +  tree c;
> +
> +  if (!ctx)
> +    return;
> +
> +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> +      {
> +	tree decl = OMP_CLAUSE_DECL (c);
> +	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> +	  ctx->oacc_addressable_var_decls->safe_push (decl);
> +      }
> +}

You don't want to do this for all GOMP_FOR or GOMP_TARGET context, I'd hope
you only want to do that for OpenACC contexts.  Perhaps it is ok
to bail out early if the context isn't OpenACC one.  On the other side, the
if (!ctx) condition makes no sense, the callers of course guarantee that ctx
is non-NULL.

> @@ -10665,6 +10774,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>  		 ctx);
>        break;
>      case GIMPLE_BIND:
> +      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind *> (stmt)));

Again, why is this done unconditionally?  It should be relevant to gather it
only in some subset of context, so guard that and don't do it otherwise.

>        lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
>        maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
>        break;
> @@ -10905,6 +11015,7 @@ execute_lower_omp (void)
>  
>    if (all_contexts)
>      {
> +      splay_tree_foreach (all_contexts, process_oacc_gangprivate_1, NULL);

Similarly.  Either guard with if (flag_openacc), or have some flag cleared
at the start of the pass and set only if you find something interesting so
that the splay_tree_foreach does something.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-03 16:23                   ` Jakub Jelinek
@ 2019-06-07 14:08                     ` Julian Brown
  2019-06-12 10:23                       ` Jakub Jelinek
                                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Julian Brown @ 2019-06-07 14:08 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Bernhard Reutner-Fischer, gcc-patches, Tom de Vries, Chung-Lin Tang

[-- Attachment #1: Type: text/plain, Size: 3303 bytes --]

Hi Jakub,

Thanks for the review! I believe I've addressed all your comments in
the attached version of the patch.

On Mon, 3 Jun 2019 18:23:00 +0200
Jakub Jelinek <jakub@redhat.com> wrote:

> Why vec<tree> * rather than vec<tree>?

> > @@ -878,6 +884,7 @@ new_omp_context (gimple *stmt, omp_context
> > *outer_ctx) }
> >  
> >    ctx->cb.decl_map = new hash_map<tree, tree>;
> > +  ctx->oacc_addressable_var_decls = new vec<tree> ();  
> 
> You then don't have to new it here and delete below.  As the context
> is cleared with XCNEW, you don't need to do anything here, and just
> release when deleting.  Note, even if using a pointer for some reason
> was needed (not in this case), using unconditional new for something
> only used for small subset of contexts is unacceptable, it would be
> then desirable to only create when needed.

Fixed.

> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This
> > information
> > +   is used to mark up variables that should be made private
> > per-gang.  */ +
> > +static void
> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> > +{
> > +  tree c;
> > +
> > +  if (!ctx)
> > +    return;
> > +
> > +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> > +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> > +      {
> > +	tree decl = OMP_CLAUSE_DECL (c);
> > +	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> > +	  ctx->oacc_addressable_var_decls->safe_push (decl);
> > +      }
> > +}  
> 
> You don't want to do this for all GOMP_FOR or GOMP_TARGET context,
> I'd hope you only want to do that for OpenACC contexts.  Perhaps it
> is ok to bail out early if the context isn't OpenACC one.  On the
> other side, the if (!ctx) condition makes no sense, the callers of
> course guarantee that ctx is non-NULL.

I'm not sure where that came from -- ctx can be NULL at the top-level
of lower_omp as called from execute_lower_omp. Maybe that was left over
from an earlier version of the patch. Anyway, I've removed that bit
and fixed the patch to only call oacc_record_private_var_clauses in
OpenACC contexts.

> > @@ -10665,6 +10774,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p,
> > omp_context *ctx) ctx);
> >        break;
> >      case GIMPLE_BIND:
> > +      oacc_record_vars_in_bind (ctx, gimple_bind_vars (as_a <gbind
> > *> (stmt)));  
> 
> Again, why is this done unconditionally?  It should be relevant to
> gather it only in some subset of context, so guard that and don't do
> it otherwise.

And here (where ctx *can* be NULL).

> >        lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)),
> > ctx); maybe_remove_omp_member_access_dummy_vars (as_a <gbind *>
> > (stmt)); break;
> > @@ -10905,6 +11015,7 @@ execute_lower_omp (void)
> >  
> >    if (all_contexts)
> >      {
> > +      splay_tree_foreach (all_contexts,
> > process_oacc_gangprivate_1, NULL);  
> 
> Similarly.  Either guard with if (flag_openacc), or have some flag
> cleared at the start of the pass and set only if you find something
> interesting so that the splay_tree_foreach does something.

I've introduced maybe_oacc_gangprivate_vars, and the splay tree walk is
only called if that's true. It's set whenever something's put in
oacc_addressable_var_decls in some omp context.

Re-tested with offloading to NVPTX. OK?

Thanks,

Julian


[-- Attachment #2: gang-local-storage-in-shm-8.diff --]
[-- Type: text/x-patch, Size: 21744 bytes --]

commit 6c2a018b940d0b132395048b0600f7d897319ee2
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Aug 9 20:27:04 2018 -0700

    [OpenACC] Add support for gang local storage allocation in shared memory
    
    2019-06-03  Julian Brown  <julian@codesourcery.com>
                Chung-Lin Tang  <cltang@codesourcery.com>
    
            gcc/
            * config/nvptx/nvptx.c (tree-hash-traits.h): Include.
            (gangprivate_shared_size): New global variable.
            (gangprivate_shared_align): Likewise.
            (gangprivate_shared_sym): Likewise.
            (gangprivate_shared_hmap): Likewise.
            (nvptx_option_override): Initialize gangprivate_shared_sym,
            gangprivate_shared_align.
            (nvptx_file_end): Output gangprivate_shared_sym.
            (nvptx_goacc_expand_accel_var): New function.
            (nvptx_set_current_function): Initialise gangprivate_shared_hmap. Add
            function comment.
            (TARGET_GOACC_EXPAND_ACCEL): Likewise.
            * doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
            * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
            * expr.c (expand_expr_real_1): Remap VAR_DECLs marked with the
            "oacc gangprivate" attribute.
            * omp-low.c (omp_context): Add oacc_partitioning_level and
            oacc_addressable_var_decls fields.
            (maybe_oacc_gangprivate_vars): New global variable.
            (delete_omp_context): Release oacc_addressable_var_decls in old
            omp_context.
            (lower_oacc_head_tail): Record partitioning-level count in omp context.
            (oacc_record_private_var_clauses, oacc_record_vars_in_bind,
            mark_oacc_gangprivate): New functions.
            (lower_omp_for): Call oacc_record_private_var_clauses with OpenACC "for"
            clauses.
            (lower_omp_target): Likewise, for OpenACC "target" clauses.
            Call mark_oacc_gangprivate for offloaded target regions.
            (process_oacc_gangprivate): New function.
            (lower_omp_1): Call oacc_record_vars_in_bind for GIMPLE_BIND within
            OpenACC regions.
            (execute_lower_omp): Call process_oacc_gangprivate for each OMP
            context.
            * target.def (expand_accel_var): New hook.
    
            libgomp/
            * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
            * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
            * testsuite/libgomp.oacc-c/pr85465.c: New test.
            * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
            * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a28099ac89d..c93fb926609 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -74,6 +74,7 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -166,6 +167,12 @@ static unsigned vector_red_align;
 static unsigned vector_red_partition;
 static GTY(()) rtx vector_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -247,6 +254,10 @@ nvptx_option_override (void)
   vector_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
   vector_red_partition = 0;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -5237,6 +5248,10 @@ nvptx_file_end (void)
     write_shared_buffer (asm_out_file, vector_red_sym,
 			 vector_red_align, vector_red_size);
 
+  if (gangprivate_shared_size)
+    write_shared_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -6430,14 +6445,49 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+/* Implement TARGET_GOACC_EXPAND_ACCEL_VAR.  Place "oacc gangprivate"
+   variables in shared memory.  */
+
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (VAR_P (var)
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size
+	    = (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
 static GTY(()) tree nvptx_previous_fndecl;
 
+/* Implement TARGET_SET_CURRENT_FUNCTION.  Reset per-function context.  */
+
 static void
 nvptx_set_current_function (tree fndecl)
 {
   if (!fndecl || fndecl == nvptx_previous_fndecl)
     return;
 
+  gangprivate_shared_hmap.empty ();
   nvptx_previous_fndecl = fndecl;
   vector_red_partition = 0;
   oacc_bcast_partition = 0;
@@ -6579,6 +6629,9 @@ nvptx_set_current_function (tree fndecl)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 622e8cf240f..61da9709268 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6161,6 +6161,14 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 17560fce6b7..5579623e331 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4210,6 +4210,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index c78bc74c0d9..34510aab55d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may be intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a7f35ffe416..67e1e82ec00 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -141,6 +141,12 @@ struct omp_context
   /* True if lower_omp_1 should look up lastprivate conditional in parent
      context.  */
   bool combined_into_simd_safelen0;
+
+  /* The number of levels of OpenACC partitioning invoked in this context.  */
+  unsigned oacc_partitioning_levels;
+
+  /* Addressable variable decls in this context.  */
+  vec<tree> oacc_addressable_var_decls;
 };
 
 static splay_tree all_contexts;
@@ -148,6 +154,7 @@ static int taskreg_nesting_level;
 static int target_nesting_level;
 static bitmap task_shared_vars;
 static vec<omp_context *> taskreg_contexts;
+static bool maybe_oacc_gangprivate_vars;
 
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
@@ -964,6 +971,7 @@ delete_omp_context (splay_tree_value value)
     }
 
   delete ctx->lastprivate_conditional_map;
+  ctx->oacc_addressable_var_decls.release ();
 
   XDELETE (ctx);
 }
@@ -6881,6 +6889,9 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
   gcc_assert (count);
+
+  ctx->oacc_partitioning_levels = count;
+
   for (unsigned done = 1; count; count--, done++)
     {
       gimple_seq fork_seq = NULL;
@@ -8582,6 +8593,77 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+	tree decl = OMP_CLAUSE_DECL (c);
+	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
+	  {
+	    ctx->oacc_addressable_var_decls.safe_push (decl);
+	    maybe_oacc_gangprivate_vars = true;
+	  }
+      }
+}
+
+/* Record addressable vars declared in BINDVARS in CTX.  This information is
+   used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    if (VAR_P (v) && TREE_ADDRESSABLE (v))
+      {
+	ctx->oacc_addressable_var_decls.safe_push (v);
+	maybe_oacc_gangprivate_vars = true;
+      }
+}
+
+/* Mark addressable variables which are declared implicitly or explicitly as
+   gang private with a special attribute.  These may need to have their
+   declarations altered later on in compilation (e.g. in
+   execute_oacc_device_lower or the backend, depending on how the OpenACC
+   execution model is implemented on a given target) to ensure that sharing
+   semantics are correct.  */
+
+static void
+mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
+{
+  int i;
+  tree decl;
+
+  FOR_EACH_VEC_ELT (*decls, i, decl)
+    {
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+	{
+	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
+	  if (inner_decl)
+	    {
+	      decl = inner_decl;
+	      break;
+	    }
+	}
+      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file,
+		       "Setting 'oacc gangprivate' attribute for decl:");
+	      print_generic_decl (dump_file, decl, TDF_SLIM);
+	      fputc ('\n', dump_file);
+	    }
+	  DECL_ATTRIBUTES (decl)
+	    = tree_cons (get_identifier ("oacc gangprivate"),
+			 NULL, DECL_ATTRIBUTES (decl));
+	}
+    }
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -8599,6 +8681,9 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  if (is_gimple_omp_oacc (ctx->stmt))
+    oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -9544,6 +9629,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   clauses = gimple_omp_target_clauses (stmt);
 
+  if (is_gimple_omp_oacc (ctx->stmt))
+    oacc_record_private_var_clauses (ctx, clauses);
+
   gimple_seq dep_ilist = NULL;
   gimple_seq dep_olist = NULL;
   if (omp_find_clause (clauses, OMP_CLAUSE_DEPEND))
@@ -9794,6 +9882,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   if (offloaded)
     {
+      mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
+
       /* Declare all the variables created by mapping and the variables
 	 declared in the scope of the target body.  */
       record_vars_into (ctx->block_vars, child_fn);
@@ -10645,6 +10735,25 @@ lower_omp_grid_body (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		       gimple_build_omp_return (false));
 }
 
+/* Find gang-private variables in a context.  */
+
+static int
+process_oacc_gangprivate (splay_tree_node node, void * ARG_UNUSED (data))
+{
+  omp_context *ctx = (omp_context *) node->value;
+  unsigned level_total = 0;
+  omp_context *thisctx;
+
+  for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
+    level_total += thisctx->oacc_partitioning_levels;
+
+  /* If the current context and parent contexts are distributed over a
+     total of one parallelism level, we have gang partitioning.  */
+  if (level_total == 1)
+    mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
+
+  return 0;
+}
 
 /* Callback for lower_omp_1.  Return non-NULL if *tp needs to be
    regimplified.  If DATA is non-NULL, lower_omp_1 is outside
@@ -10789,6 +10898,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      if (ctx && is_gimple_omp_oacc (ctx->stmt))
+	oacc_record_vars_in_bind (ctx,
+				  gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
@@ -11024,6 +11136,7 @@ execute_lower_omp (void)
   FOR_EACH_VEC_ELT (taskreg_contexts, i, ctx)
     finish_taskreg_scan (ctx);
   taskreg_contexts.release ();
+  maybe_oacc_gangprivate_vars = false;
 
   if (all_contexts->root)
     {
@@ -11036,6 +11149,8 @@ execute_lower_omp (void)
 
   if (all_contexts)
     {
+      if (maybe_oacc_gangprivate_vars)
+	splay_tree_foreach (all_contexts, process_oacc_gangprivate, NULL);
       splay_tree_delete (all_contexts);
       all_contexts = NULL;
     }
diff --git a/gcc/target.def b/gcc/target.def
index 7d52102c815..5334c206afa 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1719,6 +1719,16 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 00000000000..f378346ed0a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+        #pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 00000000000..a4f81a39e24
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c/pr85465.c b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
new file mode 100644
index 00000000000..329e8a09cf9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int
+main (void)
+{
+#pragma acc parallel
+  foo ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
new file mode 100644
index 00000000000..5f8a5e650ea
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -0,0 +1,25 @@
+! Test for "oacc gangprivate" attribute on gang-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        !$acc atomic update
+        w = w + 1
+        !$acc end atomic
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
new file mode 100644
index 00000000000..d147229d91e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
@@ -0,0 +1,23 @@
+! Test for lack of "oacc gangprivate" attribute on worker-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-omplower-details" }
+! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl" 0 "omplower" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang worker private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        w = w + 1
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-07 14:08                     ` Julian Brown
@ 2019-06-12 10:23                       ` Jakub Jelinek
  2019-06-12 10:32                         ` Tom de Vries
  2019-06-12 11:57                       ` Thomas Schwinge
  2021-05-21 19:05                       ` Thomas Schwinge
  2 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2019-06-12 10:23 UTC (permalink / raw)
  To: Julian Brown
  Cc: Bernhard Reutner-Fischer, gcc-patches, Tom de Vries, Chung-Lin Tang

On Fri, Jun 07, 2019 at 03:08:37PM +0100, Julian Brown wrote:
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index a7f35ffe416..67e1e82ec00 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -9794,6 +9882,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>  
>    if (offloaded)
>      {
> +      mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
> +

The above one still doesn't seem to be guarded for OpenACC constructs only.

As for the rest of the patch, you need Tom to look over the nvptx changes.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-12 10:23                       ` Jakub Jelinek
@ 2019-06-12 10:32                         ` Tom de Vries
  0 siblings, 0 replies; 26+ messages in thread
From: Tom de Vries @ 2019-06-12 10:32 UTC (permalink / raw)
  To: Jakub Jelinek, Julian Brown
  Cc: Bernhard Reutner-Fischer, gcc-patches, Chung-Lin Tang

On 12-06-19 12:22, Jakub Jelinek wrote:
> On Fri, Jun 07, 2019 at 03:08:37PM +0100, Julian Brown wrote:
>> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
>> index a7f35ffe416..67e1e82ec00 100644
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -9794,6 +9882,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>>  
>>    if (offloaded)
>>      {
>> +      mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
>> +
> 
> The above one still doesn't seem to be guarded for OpenACC constructs only.
> 
> As for the rest of the patch, you need Tom to look over the nvptx changes.

I haven't seen any nvptx changes mentioned since I ok-ed the nvptx part
( https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00324.html ), so on that
basis I'd say it's still ok.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-07 14:08                     ` Julian Brown
  2019-06-12 10:23                       ` Jakub Jelinek
@ 2019-06-12 11:57                       ` Thomas Schwinge
  2019-06-12 19:43                         ` Julian Brown
  2021-05-21 19:05                       ` Thomas Schwinge
  2 siblings, 1 reply; 26+ messages in thread
From: Thomas Schwinge @ 2019-06-12 11:57 UTC (permalink / raw)
  To: Julian Brown
  Cc: Bernhard Reutner-Fischer, gcc-patches, Tom de Vries,
	Chung-Lin Tang, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 9549 bytes --]

Hi!

First, thanks for picking this up, and improving the patch you inherited.


Then, just a few individual comments, not a complete review.

(As far as I concerned, and as far as relevant, these can be addressed
later, incrementally, of course.)


I understand right that this will address some aspects of PR90115
"OpenACC: predetermined private levels for variables declared in blocks"
(so please mention that one in the ChangeLog updates, and commit log),
but it doesn't address all of these aspects (and see also Cesar's list in
<http://mid.mail-archive.com/70d27ebd-762e-59a3-082f-48fa0c687212@codesourcery.com>),
and also not yet PR90114 "Predetermined private levels for variables
declared in OpenACC accelerator routines"?


On Fri, 7 Jun 2019 15:08:37 +0100, Julian Brown <julian@codesourcery.com> wrote:
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c

> @@ -5237,6 +5248,10 @@ nvptx_file_end (void)
>      write_shared_buffer (asm_out_file, vector_red_sym,
>  			 vector_red_align, vector_red_size);
>  
> +  if (gangprivate_shared_size)
> +    write_shared_buffer (asm_out_file, gangprivate_shared_sym,
> +			 gangprivate_shared_align, gangprivate_shared_size);

Curious, what is the reason that we maintain this '__gangprivate_shared'
variable on a per-file basis instead of on a per-function basis (with
names '__gangprivate_shared_[function]', or similar), which should make
it more obvious where each block of '.shared' memory belongs to?


> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi

> +@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
> +This hook, if defined, is used by accelerator target back-ends to expand
> +specially handled kinds of VAR_DECL expressions.  A particular use is to
> +place variables with specific attributes inside special accelarator
> +memories.  A return value of NULL indicates that the target does not
> +handle this VAR_DECL, and normal RTL expanding is resumed.
> +@end deftypefn

I guess I'm not terribly happy with the 'goacc.expand_accel_var' name.
Using different "memories" for specially tagged DECLs seems to be a
pretty generic concept (address spaces?), and...

> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
>        exp = SSA_NAME_VAR (ssa_name);
>        goto expand_decl_rtl;
>  
> -    case PARM_DECL:
>      case VAR_DECL:
> +      /* Allow accel compiler to handle specific cases of variables,
> +	 specifically those tagged with the "oacc gangprivate" attribute,
> +	 which may be intended to be placed in special memory in GPUs.  */
> +      if (flag_openacc && targetm.goacc.expand_accel_var)
> +	{
> +	  temp = targetm.goacc.expand_accel_var (exp);
> +	  if (temp)
> +	    return temp;
> +	}
> +      /* ... fall through ...  */
> +
> +    case PARM_DECL:

... I'm thus confused that there isn't already a generic mechanism
available in GCC, that we can just use instead of adding a new one here?
Thinking about the "address spaces" stuff in 'gcc/target.def' -- or is
that the wrong concept?  (I'm not familiar with all that, and haven't
looked closely.)


> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
> +   is used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> +{
> +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> +      {
> +	tree decl = OMP_CLAUSE_DECL (c);
> +	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> +	  {
> +	    ctx->oacc_addressable_var_decls.safe_push (decl);
> +	    maybe_oacc_gangprivate_vars = true;
> +	  }
> +      }
> +}

Are all the relevant variables addressable?  And/or, need only those be
considered?

> +/* Record addressable vars declared in BINDVARS in CTX.  This information is
> +   used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> +{
> +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> +    if (VAR_P (v) && TREE_ADDRESSABLE (v))
> +      {
> +	ctx->oacc_addressable_var_decls.safe_push (v);
> +	maybe_oacc_gangprivate_vars = true;
> +      }
> +}

Likewise.


> +/* Mark addressable variables which are declared implicitly or explicitly as
> +   gang private with a special attribute.  These may need to have their
> +   declarations altered later on in compilation (e.g. in
> +   execute_oacc_device_lower or the backend, depending on how the OpenACC
> +   execution model is implemented on a given target) to ensure that sharing
> +   semantics are correct.  */
> +
> +static void
> +mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
> +{
> +  int i;
> +  tree decl;
> +
> +  FOR_EACH_VEC_ELT (*decls, i, decl)
> +    {
> +      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
> +	{
> +	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
> +	  if (inner_decl)
> +	    {
> +	      decl = inner_decl;
> +	      break;
> +	    }
> +	}
> +      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    {
> +	      fprintf (dump_file,
> +		       "Setting 'oacc gangprivate' attribute for decl:");
> +	      print_generic_decl (dump_file, decl, TDF_SLIM);
> +	      fputc ('\n', dump_file);
> +	    }
> +	  DECL_ATTRIBUTES (decl)
> +	    = tree_cons (get_identifier ("oacc gangprivate"),
> +			 NULL, DECL_ATTRIBUTES (decl));
> +	}
> +    }
> +}

So I'm confused how that can be done here ('omplower'), given that the
decision about how levels of parallelism (gang, worker, vector) are
assigned is only done later ('oaccdevlow'), separately/differently per
offloading target?

The following seems relevant:

> +/* Find gang-private variables in a context.  */
> +
> +static int
> +process_oacc_gangprivate (splay_tree_node node, void * ARG_UNUSED (data))
> +{
> +  omp_context *ctx = (omp_context *) node->value;
> +  unsigned level_total = 0;
> +  omp_context *thisctx;
> +
> +  for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
> +    level_total += thisctx->oacc_partitioning_levels;
> +
> +  /* If the current context and parent contexts are distributed over a
> +     total of one parallelism level, we have gang partitioning.  */
> +  if (level_total == 1)
> +    mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
> +
> +  return 0;
> +}

..., but I didn't quickly manage to grok that.  (I shall try harder,
later on.)

But still then, this looks like it might work for the outer level (gang)
only (because all offloading targets are expected to assign gang level to
the outermost loop -- might that be the underlying assumption?), but it
won't work for inner loop/privatization levels?  (..., which I understand
this patch isn't doing anything about.)


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-w" } */
> +
> +int
> +main (void)
> +{
> +#pragma acc parallel
> +  foo ();
> +
> +  return 0;
> +}

I think that given your re-work of the implementation (move stuff from
front ends into OMP lowering) this test case isn't relevant anymore (was
a front end ICE).


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
> @@ -0,0 +1,25 @@
> +! Test for "oacc gangprivate" attribute on gang-private variables
> +
> +! { dg-do run }
> +! { dg-additional-options "-fdump-tree-omplower-details" }
> +! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */

I prefer if such scanning is placed close to relevant source code
constructs, so I'd move this 'scan-tree-dump-times'...

> +
> +program main
> +  integer :: w, arr(0:31)
> +
> +  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
> +    !$acc loop gang private(w)

... here.

(Just to make sure, a Fortran 'integer' will always be
'integer(kind=4)'?)

> +    do j = 0, 31
> +      w = 0
> +      !$acc loop seq
> +      do i = 0, 31
> +        !$acc atomic update
> +        w = w + 1
> +        !$acc end atomic
> +      end do
> +      arr(j) = w
> +    end do
> +  !$acc end parallel
> +
> +  if (any (arr .ne. 32)) stop 1
> +end program main

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
> @@ -0,0 +1,23 @@
> +! Test for lack of "oacc gangprivate" attribute on worker-private variables
> +
> +! { dg-do run }
> +! { dg-additional-options "-fdump-tree-omplower-details" }
> +! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute for decl" 0 "omplower" } } */

Likewise...

> +
> +program main
> +  integer :: w, arr(0:31)
> +
> +  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
> +    !$acc loop gang worker private(w)

... here (I suppose).

> +    do j = 0, 31
> +      w = 0
> +      !$acc loop seq
> +      do i = 0, 31
> +        w = w + 1
> +      end do
> +      arr(j) = w
> +    end do
> +  !$acc end parallel
> +
> +  if (any (arr .ne. 32)) stop 1
> +end program main


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 658 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-12 11:57                       ` Thomas Schwinge
@ 2019-06-12 19:43                         ` Julian Brown
  2019-11-06 22:59                           ` Julian Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Julian Brown @ 2019-06-12 19:43 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Bernhard Reutner-Fischer, gcc-patches, Tom de Vries,
	Chung-Lin Tang, Jakub Jelinek

On Wed, 12 Jun 2019 13:57:22 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi!
> 
> First, thanks for picking this up, and improving the patch you
> inherited.

Thanks for review!

> I understand right that this will address some aspects of PR90115
> "OpenACC: predetermined private levels for variables declared in
> blocks" (so please mention that one in the ChangeLog updates, and
> commit log), but it doesn't address all of these aspects (and see
> also Cesar's list in
> <http://mid.mail-archive.com/70d27ebd-762e-59a3-082f-48fa0c687212@codesourcery.com>),
> and also not yet PR90114 "Predetermined private levels for variables
> declared in OpenACC accelerator routines"?

There's two possible reasons for placing gang-private variables in
shared memory: correct implementation of OpenACC semantics, or
optimisation, since shared memory is faster than local memory (on NVidia
devices). Handling of private variables is intimately tied with the
execution model for gangs/workers/vectors implemented by a particular
target: for PTX, that's handled in the backend using a
broadcasting/neutering scheme.

That is sufficient for code that e.g. sets a variable in worker-single
mode and expects to use the value in worker-partitioned mode. The
difficulty (semantics-wise) comes when the user wants to do something
like an atomic operation in worker-partitioned mode and expects a
worker-single variable to be shared across each partitioned worker.
Forcing use of shared memory for such variables makes that work
properly.

It is *not* sufficient for the next level down, though -- expecting to
perform atomic operations in vector-partitioned mode on a variable
that is declared in vector-single mode, i.e. so that it is supposed to
be shared across all vector elements. AFAIK, that's not
straightforward, and we haven't attempted to implement it.

I think the original motivation for this patch was optimisation, though
-- typical code won't try to use atomics in this way. Cesar's list of
caveats that you linked to seems to support that notion.

> On Fri, 7 Jun 2019 15:08:37 +0100, Julian Brown
> <julian@codesourcery.com> wrote:
> > --- a/gcc/config/nvptx/nvptx.c
> > +++ b/gcc/config/nvptx/nvptx.c  
> 
> > @@ -5237,6 +5248,10 @@ nvptx_file_end (void)
> >      write_shared_buffer (asm_out_file, vector_red_sym,
> >  			 vector_red_align, vector_red_size);
> >  
> > +  if (gangprivate_shared_size)
> > +    write_shared_buffer (asm_out_file, gangprivate_shared_sym,
> > +			 gangprivate_shared_align,
> > gangprivate_shared_size);  
> 
> Curious, what is the reason that we maintain this
> '__gangprivate_shared' variable on a per-file basis instead of on a
> per-function basis (with names '__gangprivate_shared_[function]', or
> similar), which should make it more obvious where each block of
> '.shared' memory belongs to?

I can't comment on that, I'm afraid that was a part of the patch that I
inherited and didn't alter much...

> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi  
> 
> > +@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree
> > @var{var}) +This hook, if defined, is used by accelerator target
> > back-ends to expand +specially handled kinds of VAR_DECL
> > expressions.  A particular use is to +place variables with specific
> > attributes inside special accelarator +memories.  A return value of
> > NULL indicates that the target does not +handle this VAR_DECL, and
> > normal RTL expanding is resumed. +@end deftypefn  
> 
> I guess I'm not terribly happy with the 'goacc.expand_accel_var' name.
> Using different "memories" for specially tagged DECLs seems to be a
> pretty generic concept (address spaces?), and...

This is partly another NVPTX weirdness -- the target uses address
spaces, but only within the backend, and without using the generic
middle-end address space machinery. The other reason for using an
attribute instead of assigning an address space is that the former can
be detected by the target compiler, but will be ignored by the host
compiler. Forcing use of an address space this early would mean that
the same non-standard address space would have to make sense for both
host and offloaded code.

For AMD GCN, we do use the generic address space support, and I found
that I could re-use the "oacc gangprivate" attribute -- but not the
expand_accel_var hook (expand time is too late for that target).
Instead, another new hook "TARGET_GOACC_ADJUST_GANGPRIVATE_DECL" is
called from omp-offload.c:execute_oacc_device_lower for variables that
have the "oacc gangprivate" attribute set. Those bits haven't been
posted upstream yet, though.

> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target,
> > machine_mode tmode, exp = SSA_NAME_VAR (ssa_name);
> >        goto expand_decl_rtl;
> >  
> > -    case PARM_DECL:
> >      case VAR_DECL:
> > +      /* Allow accel compiler to handle specific cases of
> > variables,
> > +	 specifically those tagged with the "oacc gangprivate"
> > attribute,
> > +	 which may be intended to be placed in special memory in
> > GPUs.  */
> > +      if (flag_openacc && targetm.goacc.expand_accel_var)
> > +	{
> > +	  temp = targetm.goacc.expand_accel_var (exp);
> > +	  if (temp)
> > +	    return temp;
> > +	}
> > +      /* ... fall through ...  */
> > +
> > +    case PARM_DECL:  
> 
> ... I'm thus confused that there isn't already a generic mechanism
> available in GCC, that we can just use instead of adding a new one
> here? Thinking about the "address spaces" stuff in 'gcc/target.def'
> -- or is that the wrong concept?  (I'm not familiar with all that,
> and haven't looked closely.)

Same point again -- the same address space would have to be supported
on the host and offload compiler. I'm happy to accept suggestions for
another name for the hook though?

> > --- a/gcc/omp-low.c
> > +++ b/gcc/omp-low.c  
> 
> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This
> > information
> > +   is used to mark up variables that should be made private
> > per-gang.  */ +
> > +static void
> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> > +{
> > +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> > +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> > +      {
> > +	tree decl = OMP_CLAUSE_DECL (c);
> > +	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> > +	  {
> > +	    ctx->oacc_addressable_var_decls.safe_push (decl);
> > +	    maybe_oacc_gangprivate_vars = true;
> > +	  }
> > +      }
> > +}  
> 
> Are all the relevant variables addressable?  And/or, need only those
> be considered?

Yes, I believe so. At least from a correctness perspective, a
non-addressable variable can't be accessed outside the current thread,
so it can go in a (faster than shared memory) register -- though that
register may need to be broadcast in some circumstances. A variable can
only meaningfully be "shared" across workers or vector lanes if its
address is taken, e.g. by a call to an atomic builtin.

From an optimisation perspective, the answer might be fuzzier: maybe
sometimes, using shared memory directly would be faster than
broadcasting.

> > +/* Record addressable vars declared in BINDVARS in CTX.  This
> > information is
> > +   used to mark up variables that should be made private
> > per-gang.  */ +
> > +static void
> > +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> > +{
> > +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> > +    if (VAR_P (v) && TREE_ADDRESSABLE (v))
> > +      {
> > +	ctx->oacc_addressable_var_decls.safe_push (v);
> > +	maybe_oacc_gangprivate_vars = true;
> > +      }
> > +}  
> 
> Likewise.
> 
> 
> > +/* Mark addressable variables which are declared implicitly or
> > explicitly as
> > +   gang private with a special attribute.  These may need to have
> > their
> > +   declarations altered later on in compilation (e.g. in
> > +   execute_oacc_device_lower or the backend, depending on how the
> > OpenACC
> > +   execution model is implemented on a given target) to ensure
> > that sharing
> > +   semantics are correct.  */
> > +
> > +static void
> > +mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
> > +{
> > +  int i;
> > +  tree decl;
> > +
> > +  FOR_EACH_VEC_ELT (*decls, i, decl)
> > +    {
> > +      for (omp_context *thisctx = ctx; thisctx; thisctx =
> > thisctx->outer)
> > +	{
> > +	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
> > +	  if (inner_decl)
> > +	    {
> > +	      decl = inner_decl;
> > +	      break;
> > +	    }
> > +	}
> > +      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES
> > (decl)))
> > +	{
> > +	  if (dump_file && (dump_flags & TDF_DETAILS))
> > +	    {
> > +	      fprintf (dump_file,
> > +		       "Setting 'oacc gangprivate' attribute for
> > decl:");
> > +	      print_generic_decl (dump_file, decl, TDF_SLIM);
> > +	      fputc ('\n', dump_file);
> > +	    }
> > +	  DECL_ATTRIBUTES (decl)
> > +	    = tree_cons (get_identifier ("oacc gangprivate"),
> > +			 NULL, DECL_ATTRIBUTES (decl));
> > +	}
> > +    }
> > +}  
> 
> So I'm confused how that can be done here ('omplower'), given that the
> decision about how levels of parallelism (gang, worker, vector) are
> assigned is only done later ('oaccdevlow'), separately/differently per
> offloading target?
> 
> The following seems relevant:
> 
> > +/* Find gang-private variables in a context.  */
> > +
> > +static int
> > +process_oacc_gangprivate (splay_tree_node node, void * ARG_UNUSED
> > (data)) +{
> > +  omp_context *ctx = (omp_context *) node->value;
> > +  unsigned level_total = 0;
> > +  omp_context *thisctx;
> > +
> > +  for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
> > +    level_total += thisctx->oacc_partitioning_levels;
> > +
> > +  /* If the current context and parent contexts are distributed
> > over a
> > +     total of one parallelism level, we have gang partitioning.  */
> > +  if (level_total == 1)
> > +    mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
> > +
> > +  return 0;
> > +}  
> 
> ..., but I didn't quickly manage to grok that.  (I shall try harder,
> later on.)
> 
> But still then, this looks like it might work for the outer level
> (gang) only (because all offloading targets are expected to assign
> gang level to the outermost loop -- might that be the underlying
> assumption?), but it won't work for inner loop/privatization levels?
> (..., which I understand this patch isn't doing anything about.)

The "oacc gangprivate" only applies to variables that are (addressable
and) private per-gang, but the attribute marking works on both
top-level "acc parallel" directives and "acc loop" directives below
that -- so long as they don't explicitly use parallelism finer than
"gang" level. It also works on variables declared private() using
OpenACC clauses in all supported languages, or those that are declared
in an appropriate C/C++ scope.

At least for loops with reductions, gang-partitioned loops have
different semantics from worker and vector-partitioned loops. So I
think in general, it must be the case that it is possible to analyse
OpenACC code "lexically" to determine which loops are gang partitioned,
and which are partitioned at finer levels. It can't be deferred
entirely to the target. It's been a while since I read those bits of
the standard, though!

But yes, in GCC, omp-low only tries to calculate the maximum
partitioning level for each loop nest. The final determination isn't
made until oaccdevlow time. That's OK if shared memory is being used
only as an optimisation, much less OK if it's a necessary part of
implementing OpenACC semantics properly. It might be more of an issue
if we tried to support "vector-shared" variables properly.

> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-w" } */
> > +
> > +int
> > +main (void)
> > +{
> > +#pragma acc parallel
> > +  foo ();
> > +
> > +  return 0;
> > +}  
> 
> I think that given your re-work of the implementation (move stuff from
> front ends into OMP lowering) this test case isn't relevant anymore
> (was a front end ICE).

OK, I can remove that.

> > --- /dev/null
> > +++
> > b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
> > @@ -0,0 +1,25 @@ +! Test for "oacc gangprivate" attribute on
> > gang-private variables +
> > +! { dg-do run }
> > +! { dg-additional-options "-fdump-tree-omplower-details" }
> > +! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate'
> > attribute for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */  
> 
> I prefer if such scanning is placed close to relevant source code
> constructs, so I'd move this 'scan-tree-dump-times'...
> 
> > +
> > +program main
> > +  integer :: w, arr(0:31)
> > +
> > +  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
> > +    !$acc loop gang private(w)  
> 
> ... here.
> 
> (Just to make sure, a Fortran 'integer' will always be
> 'integer(kind=4)'?)

No idea! I can check.

> > +    do j = 0, 31
> > +      w = 0
> > +      !$acc loop seq
> > +      do i = 0, 31
> > +        !$acc atomic update
> > +        w = w + 1
> > +        !$acc end atomic
> > +      end do
> > +      arr(j) = w
> > +    end do
> > +  !$acc end parallel
> > +
> > +  if (any (arr .ne. 32)) stop 1
> > +end program main  
> 
> > --- /dev/null
> > +++
> > b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
> > @@ -0,0 +1,23 @@ +! Test for lack of "oacc gangprivate" attribute
> > on worker-private variables +
> > +! { dg-do run }
> > +! { dg-additional-options "-fdump-tree-omplower-details" }
> > +! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate'
> > attribute for decl" 0 "omplower" } } */  
> 
> Likewise...
> 
> > +
> > +program main
> > +  integer :: w, arr(0:31)
> > +
> > +  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
> > +    !$acc loop gang worker private(w)  
> 
> ... here (I suppose).
> 
> > +    do j = 0, 31
> > +      w = 0
> > +      !$acc loop seq
> > +      do i = 0, 31
> > +        w = w + 1
> > +      end do
> > +      arr(j) = w
> > +    end do
> > +  !$acc end parallel
> > +
> > +  if (any (arr .ne. 32)) stop 1
> > +end program main  

Thanks,

Julian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-12 19:43                         ` Julian Brown
@ 2019-11-06 22:59                           ` Julian Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Julian Brown @ 2019-11-06 22:59 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Bernhard Reutner-Fischer, gcc-patches, Tom de Vries,
	Chung-Lin Tang, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 16011 bytes --]

Hi!

This is a new patch that takes a different approach to the last-posted
version in this thread. I have combined the previous incremental patches
on the og9 branch that culminated in the following patch:

https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01220.html

From that email, the following explanation was given of the previous
approaches taken as to how the partitioning level for OpenACC "private"
variables was calculated and represented in the compiler, and how this
patch differs:

 - The first (by Chung-Lin Tang) recorded which variables should be
   made private per-gang in each front end (i.e. separately in C, C++
   and Fortran) using a new attribute "oacc gangprivate". This was
   deemed too early; the final determination about which loops are
   assigned which parallelism level has not yet been made at parse time.

 - The second, last discussed here:

     https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00726.html

   moved the analysis of OpenACC contexts to determine parallelism
   levels to omp-low.c (but kept the "oacc gangprivate" attribute and
   the NVPTX backend parts). However (as mentioned in that mail), this
   is still too early: in fact the final determination of the
   parallelism level for each loop (especially for loops without
   explicit gang/worker/vector clauses) does not happen until we reach
   the device compiler, in the oaccloops pass.

This patch builds on the second approach, but delays fixing the
parallelism level of each "private" variable (those that are
addressable, and declared private using OpenACC clauses or by defining
them in a scope nested within a compute region or partitioned loop)
until the oaccdevlow pass. This is done by adding a new internal UNIQUE
function (OACC_PRIVATE) that lists (the address of) each private
variable as an argument. These new internal functions fit into the
existing scheme for demarking OpenACC loops, as described in comments
in the patch.

Use of the "oacc gangprivate" attribute is now restricted to the NVPTX
backend (and could probably be replaced with some lighter-weight
mechanism as a followup).

I realised I omitted to make some of the cosmetic changes Thomas
highlighted below on starting to write this email, but I can do that
(with suitable retesting) if desired before committing.

On Wed, 12 Jun 2019 20:42:16 +0100
Julian Brown <julian@codesourcery.com> wrote:

> On Wed, 12 Jun 2019 13:57:22 +0200
> Thomas Schwinge <thomas@codesourcery.com> wrote:
> 
> > I understand right that this will address some aspects of PR90115
> > "OpenACC: predetermined private levels for variables declared in
> > blocks" (so please mention that one in the ChangeLog updates, and
> > commit log), but it doesn't address all of these aspects (and see
> > also Cesar's list in
> > <http://mid.mail-archive.com/70d27ebd-762e-59a3-082f-48fa0c687212@codesourcery.com>),
> > and also not yet PR90114 "Predetermined private levels for variables
> > declared in OpenACC accelerator routines"?  
> 
> There's two possible reasons for placing gang-private variables in
> shared memory: correct implementation of OpenACC semantics, or
> optimisation, since shared memory is faster than local memory (on
> NVidia devices). Handling of private variables is intimately tied
> with the execution model for gangs/workers/vectors implemented by a
> particular target: for PTX, that's handled in the backend using a
> broadcasting/neutering scheme.
> 
> That is sufficient for code that e.g. sets a variable in worker-single
> mode and expects to use the value in worker-partitioned mode. The
> difficulty (semantics-wise) comes when the user wants to do something
> like an atomic operation in worker-partitioned mode and expects a
> worker-single variable to be shared across each partitioned worker.
> Forcing use of shared memory for such variables makes that work
> properly.
> 
> It is *not* sufficient for the next level down, though -- expecting to
> perform atomic operations in vector-partitioned mode on a variable
> that is declared in vector-single mode, i.e. so that it is supposed to
> be shared across all vector elements. AFAIK, that's not
> straightforward, and we haven't attempted to implement it.
> 
> I think the original motivation for this patch was optimisation,
> though -- typical code won't try to use atomics in this way. Cesar's
> list of caveats that you linked to seems to support that notion.

After a little further investigation, I came to the conclusion that the
patch was always originally about correctness, but optimisation. But
that's largely academic now.

> > I guess I'm not terribly happy with the 'goacc.expand_accel_var'
> > name. Using different "memories" for specially tagged DECLs seems
> > to be a pretty generic concept (address spaces?), and...  
> 
> This is partly another NVPTX weirdness -- the target uses address
> spaces, but only within the backend, and without using the generic
> middle-end address space machinery. The other reason for using an
> attribute instead of assigning an address space is that the former can
> be detected by the target compiler, but will be ignored by the host
> compiler. Forcing use of an address space this early would mean that
> the same non-standard address space would have to make sense for both
> host and offloaded code.
> 
> For AMD GCN, we do use the generic address space support, and I found
> that I could re-use the "oacc gangprivate" attribute -- but not the
> expand_accel_var hook (expand time is too late for that target).
> Instead, another new hook "TARGET_GOACC_ADJUST_GANGPRIVATE_DECL" is
> called from omp-offload.c:execute_oacc_device_lower for variables that
> have the "oacc gangprivate" attribute set. Those bits haven't been
> posted upstream yet, though.

This patch uses both target hooks -- the
TARGET_GOACC_ADJUST_PRIVATE_DECL (renamed), and
TARGET_GOACC_EXPAND_ACCEL_VAR. The first can tweak the decl at
oaccdevlow time, and the second at expand time. This version of the
patch doesn't provide full support for gang-private variables on AMD
GCN yet though, since that depends on other code that hasn't been
upstreamed yet. (GCN works with the equivalent patch to this on the og9
branch though.)

> > > --- a/gcc/expr.c
> > > +++ b/gcc/expr.c
> > > @@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target,
> > > machine_mode tmode, exp = SSA_NAME_VAR (ssa_name);
> > >        goto expand_decl_rtl;
> > >  
> > > -    case PARM_DECL:
> > >      case VAR_DECL:
> > > +      /* Allow accel compiler to handle specific cases of
> > > variables,
> > > +	 specifically those tagged with the "oacc gangprivate"
> > > attribute,
> > > +	 which may be intended to be placed in special memory in
> > > GPUs.  */
> > > +      if (flag_openacc && targetm.goacc.expand_accel_var)
> > > +	{
> > > +	  temp = targetm.goacc.expand_accel_var (exp);
> > > +	  if (temp)
> > > +	    return temp;
> > > +	}
> > > +      /* ... fall through ...  */
> > > +
> > > +    case PARM_DECL:    
> > 
> > ... I'm thus confused that there isn't already a generic mechanism
> > available in GCC, that we can just use instead of adding a new one
> > here? Thinking about the "address spaces" stuff in 'gcc/target.def'
> > -- or is that the wrong concept?  (I'm not familiar with all that,
> > and haven't looked closely.)  
> 
> Same point again -- the same address space would have to be supported
> on the host and offload compiler. I'm happy to accept suggestions for
> another name for the hook though?

(Still not renamed in this version, sorry.)

> > > +/* Mark addressable variables which are declared implicitly or
> > > explicitly as
> > > +   gang private with a special attribute.  These may need to have
> > > their
> > > +   declarations altered later on in compilation (e.g. in
> > > +   execute_oacc_device_lower or the backend, depending on how the
> > > OpenACC
> > > +   execution model is implemented on a given target) to ensure
> > > that sharing
> > > +   semantics are correct.  */
> > > +
> > > +static void
> > > +mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
> > > +{
> > > +  int i;
> > > +  tree decl;
> > > +
> > > +  FOR_EACH_VEC_ELT (*decls, i, decl)
> > > +    {
> > > +      for (omp_context *thisctx = ctx; thisctx; thisctx =
> > > thisctx->outer)
> > > +	{
> > > +	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
> > > +	  if (inner_decl)
> > > +	    {
> > > +	      decl = inner_decl;
> > > +	      break;
> > > +	    }
> > > +	}
> > > +      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES
> > > (decl)))
> > > +	{
> > > +	  if (dump_file && (dump_flags & TDF_DETAILS))
> > > +	    {
> > > +	      fprintf (dump_file,
> > > +		       "Setting 'oacc gangprivate' attribute for
> > > decl:");
> > > +	      print_generic_decl (dump_file, decl, TDF_SLIM);
> > > +	      fputc ('\n', dump_file);
> > > +	    }
> > > +	  DECL_ATTRIBUTES (decl)
> > > +	    = tree_cons (get_identifier ("oacc gangprivate"),
> > > +			 NULL, DECL_ATTRIBUTES (decl));
> > > +	}
> > > +    }
> > > +}    
> > 
> > So I'm confused how that can be done here ('omplower'), given that
> > the decision about how levels of parallelism (gang, worker, vector)
> > are assigned is only done later ('oaccdevlow'),
> > separately/differently per offloading target?
> > 
> > The following seems relevant:
> >   
> > > +/* Find gang-private variables in a context.  */
> > > +
> > > +static int
> > > +process_oacc_gangprivate (splay_tree_node node, void * ARG_UNUSED
> > > (data)) +{
> > > +  omp_context *ctx = (omp_context *) node->value;
> > > +  unsigned level_total = 0;
> > > +  omp_context *thisctx;
> > > +
> > > +  for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
> > > +    level_total += thisctx->oacc_partitioning_levels;
> > > +
> > > +  /* If the current context and parent contexts are distributed
> > > over a
> > > +     total of one parallelism level, we have gang partitioning.
> > > */
> > > +  if (level_total == 1)
> > > +    mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls,
> > > ctx); +
> > > +  return 0;
> > > +}    
> > 
> > ..., but I didn't quickly manage to grok that.  (I shall try harder,
> > later on.)
> > 
> > But still then, this looks like it might work for the outer level
> > (gang) only (because all offloading targets are expected to assign
> > gang level to the outermost loop -- might that be the underlying
> > assumption?), but it won't work for inner loop/privatization levels?
> > (..., which I understand this patch isn't doing anything about.)  
> 
> The "oacc gangprivate" only applies to variables that are (addressable
> and) private per-gang, but the attribute marking works on both
> top-level "acc parallel" directives and "acc loop" directives below
> that -- so long as they don't explicitly use parallelism finer than
> "gang" level. It also works on variables declared private() using
> OpenACC clauses in all supported languages, or those that are declared
> in an appropriate C/C++ scope.
> 
> At least for loops with reductions, gang-partitioned loops have
> different semantics from worker and vector-partitioned loops. So I
> think in general, it must be the case that it is possible to analyse
> OpenACC code "lexically" to determine which loops are gang
> partitioned, and which are partitioned at finer levels. It can't be
> deferred entirely to the target. It's been a while since I read those
> bits of the standard, though!
> 
> But yes, in GCC, omp-low only tries to calculate the maximum
> partitioning level for each loop nest. The final determination isn't
> made until oaccdevlow time. That's OK if shared memory is being used
> only as an optimisation, much less OK if it's a necessary part of
> implementing OpenACC semantics properly. It might be more of an issue
> if we tried to support "vector-shared" variables properly.

So: this version moves the partitioning-level calculation for private
variables out of omp-low, so this isn't an issue any more. Variables
are privatized according to the "true" partitioning level of the scope
inside the parallel region that they are associated with (i.e.
"parallel" region, or loop).

> > > +
> > > +program main
> > > +  integer :: w, arr(0:31)
> > > +
> > > +  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
> > > +    !$acc loop gang private(w)    
> > 
> > ... here.
> > 
> > (Just to make sure, a Fortran 'integer' will always be
> > 'integer(kind=4)'?)  
> 
> No idea! I can check.

That's a yes, I think.

Re-tested with offloading to nvptx. OK for mainline?

Thanks,

Julian

2019-11-06  Julian Brown  <julian@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename
	to...
	(gcn_goacc_adjust_private_decl): ...this.  Add and use LEVEL parameter.
	* config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl): Rename
	to...
	(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
	* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Delete.
	(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define using renamed
	gcn_goacc_adjust_private_decl.
	* config/nvptx/nvptx.c (tree-hash-traits.h, tree-pretty-print.h):
	Include.
	(gangprivate_shared_size): New global variable.
	(gangprivate_shared_align): Likewise.
	(gangprivate_shared_sym): Likewise.
	(gangprivate_shared_hmap): Likewise.
	(nvptx_option_override): Initialize gangprivate_shared_sym,
	gangprivate_shared_align.
	(nvptx_file_end): Output gangprivate_shared_sym.
	(nvptx_goacc_adjust_private_decl): New function.
	(nvptx_goacc_expand_accel_var): New function.
	(nvptx_set_current_function): New function.
	(TARGET_GOACC_ADJUST_PRIVATE_DECL, TARGET_GOACC_EXPAND_ACCEL_VAR):
	Define hooks.
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR,
	TARGET_GOACC_ADJUST_PRIVATE_DECL): Place new documentation hooks.
	* doc/tm.texi: Regenerate.
	* expr.c (expand_expr_real_1): Expand decls using the expand_accel_var
	OpenACC hook if defined.
	* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
	* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
	* omp-low.c (omp_context): Add oacc_addressable_var_decls field.
	(new_omp_context): Initialize oacc_addressable_var_decls in new
	omp_context.
	(delete_omp_context): Delete oacc_addressable_var_decls in old
	omp_context.
	(lower_oacc_reductions): Add PRIVATE_MARKER parameter.  Insert private
	marker before fork.
	(lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify private
	marker's gimple call arguments, and pass it to lower_oacc_reductions.
	(oacc_record_private_var_clauses, oacc_record_vars_in_bind,
	make_oacc_private_marker): New functions.
	(lower_omp_for): Call oacc_record_private_var_clauses with "for"
	clauses.  Call oacc_record_vars_in_bind for OpenACC contexts.  Create
	private marker and pass to lower_oacc_head_tail.
	(lower_omp_target): Create private marker and pass to
	lower_oacc_reductions.
	(lower_omp_1): Call oacc_record_vars_in_bind for OpenACC bind contexts.
	* omp-offload.c (convert.h): Include.
	(oacc_loop_xform_head_tail): Treat private-variable markers like
	fork/join when transforming head/tail sequences.
	(execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE to determine
	partitioning level of private variables, and process any found via
	adjust_private_decl target hook.
	* target.def (expand_accel_var, adjust_private_decl): New target hooks.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.

[-- Attachment #2: gang-local-storage-in-shm-9.diff --]
[-- Type: text/x-patch, Size: 30301 bytes --]

commit ccbf9525701265f8522c78b13751b82adba78f62
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Mar 21 15:09:24 2019 -0700

    Add support for gang local storage allocation in shared memory
    
            gcc/
            * config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename
            to...
            (gcn_goacc_adjust_private_decl): ...this.  Add and use LEVEL parameter.
            * config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl): Rename
            to...
            (gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
            * config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Delete.
            (TARGET_GOACC_ADJUST_PRIVATE_DECL): Define using renamed
            gcn_goacc_adjust_private_decl.
            * config/nvptx/nvptx.c (tree-hash-traits.h, tree-pretty-print.h):
            Include.
            (gangprivate_shared_size): New global variable.
            (gangprivate_shared_align): Likewise.
            (gangprivate_shared_sym): Likewise.
            (gangprivate_shared_hmap): Likewise.
            (nvptx_option_override): Initialize gangprivate_shared_sym,
            gangprivate_shared_align.
            (nvptx_file_end): Output gangprivate_shared_sym.
            (nvptx_goacc_adjust_private_decl): New function.
            (nvptx_goacc_expand_accel_var): New function.
            (nvptx_set_current_function): New function.
            (TARGET_GOACC_ADJUST_PRIVATE_DECL, TARGET_GOACC_EXPAND_ACCEL_VAR):
            Define hooks.
            * doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR,
            TARGET_GOACC_ADJUST_PRIVATE_DECL): Place new documentation hooks.
            * doc/tm.texi: Regenerate.
            * expr.c (expand_expr_real_1): Expand decls using the expand_accel_var
            OpenACC hook if defined.
            * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
            * internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
            * omp-low.c (omp_context): Add oacc_addressable_var_decls field.
            (new_omp_context): Initialize oacc_addressable_var_decls in new
            omp_context.
            (delete_omp_context): Delete oacc_addressable_var_decls in old
            omp_context.
            (lower_oacc_reductions): Add PRIVATE_MARKER parameter.  Insert private
            marker before fork.
            (lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify private
            marker's gimple call arguments, and pass it to lower_oacc_reductions.
            (oacc_record_private_var_clauses, oacc_record_vars_in_bind,
            make_oacc_private_marker): New functions.
            (lower_omp_for): Call oacc_record_private_var_clauses with "for"
            clauses.  Call oacc_record_vars_in_bind for OpenACC contexts.  Create
            private marker and pass to lower_oacc_head_tail.
            (lower_omp_target): Create private marker and pass to
            lower_oacc_reductions.
            (lower_omp_1): Call oacc_record_vars_in_bind for OpenACC bind contexts.
            * omp-offload.c (convert.h): Include.
            (oacc_loop_xform_head_tail): Treat private-variable markers like
            fork/join when transforming head/tail sequences.
            (execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE to determine
            partitioning level of private variables, and process any found via
            adjust_private_decl target hook.
            * target.def (expand_accel_var, adjust_private_decl): New target hooks.
    
            libgomp/
            * testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
            * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
            * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.
            * testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.

diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index da7faf29c70..714d51189d9 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -39,7 +39,7 @@ extern rtx gcn_gen_undef (machine_mode);
 extern bool gcn_global_address_p (rtx);
 extern tree gcn_goacc_adjust_propagation_record (tree record_type, bool sender,
 						 const char *name);
-extern void gcn_goacc_adjust_gangprivate_decl (tree var);
+extern void gcn_goacc_adjust_private_decl (tree var, int level);
 extern void gcn_goacc_reduction (gcall *call);
 extern bool gcn_hard_regno_rename_ok (unsigned int from_reg,
 				      unsigned int to_reg);
diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
index c6b6302e9ed..aa56e236134 100644
--- a/gcc/config/gcn/gcn-tree.c
+++ b/gcc/config/gcn/gcn-tree.c
@@ -697,8 +697,11 @@ gcn_goacc_adjust_propagation_record (tree record_type, bool sender,
 }
 
 void
-gcn_goacc_adjust_gangprivate_decl (tree var)
+gcn_goacc_adjust_private_decl (tree var, int level)
 {
+  if (level != GOMP_DIM_GANG)
+    return;
+
   tree type = TREE_TYPE (var);
   tree lds_type = build_qualified_type (type,
 		    TYPE_QUALS_NO_ADDR_SPACE (type)
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index b5f09da173c..e41023b335c 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -6033,8 +6033,8 @@ print_operand (FILE *file, rtx x, int code)
 #undef  TARGET_GOACC_ADJUST_PROPAGATION_RECORD
 #define TARGET_GOACC_ADJUST_PROPAGATION_RECORD \
   gcn_goacc_adjust_propagation_record
-#undef  TARGET_GOACC_ADJUST_GANGPRIVATE_DECL
-#define TARGET_GOACC_ADJUST_GANGPRIVATE_DECL gcn_goacc_adjust_gangprivate_decl
+#undef  TARGET_GOACC_ADJUST_PRIVATE_DECL
+#define TARGET_GOACC_ADJUST_PRIVATE_DECL gcn_goacc_adjust_private_decl
 #undef  TARGET_GOACC_FORK_JOIN
 #define TARGET_GOACC_FORK_JOIN gcn_fork_join
 #undef  TARGET_GOACC_REDUCTION
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 90171a95784..d16125aec8f 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -74,6 +74,8 @@
 #include "cfgloop.h"
 #include "fold-const.h"
 #include "intl.h"
+#include "tree-hash-traits.h"
+#include "tree-pretty-print.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -166,6 +168,12 @@ static unsigned vector_red_align;
 static unsigned vector_red_partition;
 static GTY(()) rtx vector_red_sym;
 
+/* Shared memory block for gang-private variables.  */
+static unsigned gangprivate_shared_size;
+static unsigned gangprivate_shared_align;
+static GTY(()) rtx gangprivate_shared_sym;
+static hash_map<tree_decl_hash, unsigned int> gangprivate_shared_hmap;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -247,6 +255,10 @@ nvptx_option_override (void)
   vector_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
   vector_red_partition = 0;
 
+  gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
+  SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
+  gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -5231,6 +5243,10 @@ nvptx_file_end (void)
     write_shared_buffer (asm_out_file, vector_red_sym,
 			 vector_red_align, vector_red_size);
 
+  if (gangprivate_shared_size)
+    write_shared_buffer (asm_out_file, gangprivate_shared_sym,
+			 gangprivate_shared_align, gangprivate_shared_size);
+
   if (need_softstack_decl)
     {
       write_var_marker (asm_out_file, false, true, "__nvptx_stacks");
@@ -6424,6 +6440,60 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+/* Implement TARGET_GOACC_ADJUST_PRIVATE_DECL.  Set "oacc gangprivate"
+   attribute for gang-private variable declarations.  */
+
+void
+nvptx_goacc_adjust_private_decl (tree decl, int level)
+{
+  if (level != GOMP_DIM_GANG)
+    return;
+
+  if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Setting 'oacc gangprivate' attribute for decl:");
+	  print_generic_decl (dump_file, decl, TDF_SLIM);
+	  fputc ('\n', dump_file);
+	}
+      tree id = get_identifier ("oacc gangprivate");
+      DECL_ATTRIBUTES (decl) = tree_cons (id, NULL, DECL_ATTRIBUTES (decl));
+    }
+}
+
+/* Implement TARGET_GOACC_EXPAND_ACCEL_VAR.  Place "oacc gangprivate"
+   variables in shared memory.  */
+
+static rtx
+nvptx_goacc_expand_accel_var (tree var)
+{
+  if (VAR_P (var)
+      && lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (var)))
+    {
+      unsigned int offset, *poffset;
+      poffset = gangprivate_shared_hmap.get (var);
+      if (poffset)
+	offset = *poffset;
+      else
+	{
+	  unsigned HOST_WIDE_INT align = DECL_ALIGN (var);
+	  gangprivate_shared_size
+	    = (gangprivate_shared_size + align - 1) & ~(align - 1);
+	  if (gangprivate_shared_align < align)
+	    gangprivate_shared_align = align;
+
+	  offset = gangprivate_shared_size;
+	  bool existed = gangprivate_shared_hmap.put (var, offset);
+	  gcc_assert (!existed);
+	  gangprivate_shared_size += tree_to_uhwi (DECL_SIZE_UNIT (var));
+	}
+      rtx addr = plus_constant (Pmode, gangprivate_shared_sym, offset);
+      return gen_rtx_MEM (TYPE_MODE (TREE_TYPE (var)), addr);
+    }
+  return NULL_RTX;
+}
+
 static GTY(()) tree nvptx_previous_fndecl;
 
 static void
@@ -6432,6 +6502,7 @@ nvptx_set_current_function (tree fndecl)
   if (!fndecl || fndecl == nvptx_previous_fndecl)
     return;
 
+  gangprivate_shared_hmap.empty ();
   nvptx_previous_fndecl = fndecl;
   vector_red_partition = 0;
   oacc_bcast_partition = 0;
@@ -6573,6 +6644,12 @@ nvptx_set_current_function (tree fndecl)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_GOACC_ADJUST_PRIVATE_DECL
+#define TARGET_GOACC_ADJUST_PRIVATE_DECL nvptx_goacc_adjust_private_decl
+
+#undef TARGET_GOACC_EXPAND_ACCEL_VAR
+#define TARGET_GOACC_EXPAND_ACCEL_VAR nvptx_goacc_expand_accel_var
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 915e9612208..db40f50b71c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6156,6 +6156,19 @@ like @code{cond_add@var{m}}.  The default implementation returns a zero
 constant of type @var{type}.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
+This hook, if defined, is used by accelerator target back-ends to expand
+specially handled kinds of VAR_DECL expressions.  A particular use is to
+place variables with specific attributes inside special accelarator
+memories.  A return value of NULL indicates that the target does not
+handle this VAR_DECL, and normal RTL expanding is resumed.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_PRIVATE_DECL (tree @var{var}, @var{int})
+Tweak variable declaration for a private variable at the specified
+parallelism level.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index ac0f0494992..743cf36dd00 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4215,6 +4215,10 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_PREFERRED_ELSE_VALUE
 
+@hook TARGET_GOACC_EXPAND_ACCEL_VAR
+
+@hook TARGET_GOACC_ADJUST_PRIVATE_DECL
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
diff --git a/gcc/expr.c b/gcc/expr.c
index b54bf1d3dc5..165796b97d2 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -10043,8 +10043,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Allow accel compiler to handle specific cases of variables,
+	 specifically those tagged with the "oacc gangprivate" attribute,
+	 which may be intended to be placed in special memory in GPUs.  */
+      if (flag_openacc && targetm.goacc.expand_accel_var)
+	{
+	  temp = targetm.goacc.expand_accel_var (exp);
+	  if (temp)
+	    return temp;
+	}
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 549d6f1153b..2c853047cdd 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2618,6 +2618,8 @@ expand_UNIQUE (internal_fn, gcall *stmt)
       else
 	gcc_unreachable ();
       break;
+    case IFN_UNIQUE_OACC_PRIVATE:
+      break;
     }
 
   if (pattern)
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 7164ee5cf3c..a2810edc1b4 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not see
 #define IFN_UNIQUE_CODES				  \
   DEF(UNSPEC),	\
     DEF(OACC_FORK), DEF(OACC_JOIN),		\
-    DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK)
+    DEF(OACC_HEAD_MARK), DEF(OACC_TAIL_MARK),	\
+    DEF(OACC_PRIVATE)
 
 enum ifn_unique_kind {
 #define DEF(X) IFN_UNIQUE_##X
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d8f058fe475..6d821e64767 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -156,6 +156,9 @@ struct omp_context
 
   /* True if there is bind clause on the construct (i.e. a loop construct).  */
   bool loop_p;
+
+  /* Addressable variable decls in this context.  */
+  vec<tree> *oacc_addressable_var_decls;
 };
 
 static splay_tree all_contexts;
@@ -921,6 +924,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
 
+  ctx->oacc_addressable_var_decls = new vec<tree> ();
+
   return ctx;
 }
 
@@ -1002,6 +1007,7 @@ delete_omp_context (splay_tree_value value)
     }
 
   delete ctx->lastprivate_conditional_map;
+  delete ctx->oacc_addressable_var_decls;
 
   XDELETE (ctx);
 }
@@ -6550,8 +6556,9 @@ lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *body_p,
 
 static void
 lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
-		       gcall *fork, gcall *join, gimple_seq *fork_seq,
-		       gimple_seq *join_seq, omp_context *ctx)
+		       gcall *fork, gcall *private_marker, gcall *join,
+		       gimple_seq *fork_seq, gimple_seq *join_seq,
+		       omp_context *ctx)
 {
   gimple_seq before_fork = NULL;
   gimple_seq after_fork = NULL;
@@ -6747,6 +6754,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 
   /* Now stitch things together.  */
   gimple_seq_add_seq (fork_seq, before_fork);
+  if (private_marker)
+    gimple_seq_add_stmt (fork_seq, private_marker);
   if (fork)
     gimple_seq_add_stmt (fork_seq, fork);
   gimple_seq_add_seq (fork_seq, after_fork);
@@ -7462,7 +7471,7 @@ lower_oacc_loop_marker (location_t loc, tree ddvar, bool head,
    HEAD and TAIL.  */
 
 static void
-lower_oacc_head_tail (location_t loc, tree clauses,
+lower_oacc_head_tail (location_t loc, tree clauses, gcall *private_marker,
 		      gimple_seq *head, gimple_seq *tail, omp_context *ctx)
 {
   bool inner = false;
@@ -7470,6 +7479,14 @@ lower_oacc_head_tail (location_t loc, tree clauses,
   gimple_seq_add_stmt (head, gimple_build_assign (ddvar, integer_zero_node));
 
   unsigned count = lower_oacc_head_mark (loc, ddvar, clauses, head, ctx);
+
+  if (private_marker)
+    {
+      gimple_set_location (private_marker, loc);
+      gimple_call_set_lhs (private_marker, ddvar);
+      gimple_call_set_arg (private_marker, 1, ddvar);
+    }
+
   tree fork_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK);
   tree join_kind = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN);
 
@@ -7500,7 +7517,8 @@ lower_oacc_head_tail (location_t loc, tree clauses,
 			      &join_seq);
 
       lower_oacc_reductions (loc, clauses, place, inner,
-			     fork, join, &fork_seq, &join_seq,  ctx);
+			     fork, (count == 1) ? private_marker : NULL,
+			     join, &fork_seq, &join_seq,  ctx);
 
       /* Append this level to head. */
       gimple_seq_add_seq (head, fork_seq);
@@ -9465,6 +9483,32 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Record vars listed in private clauses in CLAUSES in CTX.  This information
+   is used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
+{
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+      {
+	tree decl = OMP_CLAUSE_DECL (c);
+	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
+	  ctx->oacc_addressable_var_decls->safe_push (decl);
+      }
+}
+
+/* Record addressable vars declared in BINDVARS in CTX.  This information is
+   used to mark up variables that should be made private per-gang.  */
+
+static void
+oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
+{
+  for (tree v = bindvars; v; v = DECL_CHAIN (v))
+    if (VAR_P (v) && TREE_ADDRESSABLE (v))
+      ctx->oacc_addressable_var_decls->safe_push (v);
+}
+
 /* Callback for walk_gimple_seq.  Find #pragma omp scan statement.  */
 
 static tree
@@ -10295,6 +10339,57 @@ lower_omp_for_scan (gimple_seq *body_p, gimple_seq *dlist, gomp_for *stmt,
   *dlist = new_dlist;
 }
 
+/* Build an internal UNIQUE function with type IFN_UNIQUE_OACC_PRIVATE listing
+   the addresses of variables that should be made private at the surrounding
+   parallelism level.  Such functions appear in the gimple code stream in two
+   forms, e.g. for a partitioned loop:
+
+      .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6, 1, 68);
+      .data_dep.6 = .UNIQUE (OACC_PRIVATE, .data_dep.6, -1, &w);
+      .data_dep.6 = .UNIQUE (OACC_FORK, .data_dep.6, -1);
+      .data_dep.6 = .UNIQUE (OACC_HEAD_MARK, .data_dep.6);
+
+   or alternatively, OACC_PRIVATE can appear at the top level of a parallel,
+   not as part of a HEAD_MARK sequence:
+
+      .UNIQUE (OACC_PRIVATE, 0, 0, &w);
+
+   For such stand-alone appearances, the 3rd argument is always 0, denoting
+   gang partitioning.  */
+
+static gcall *
+make_oacc_private_marker (omp_context *ctx)
+{
+  int i;
+  tree decl;
+
+  if (ctx->oacc_addressable_var_decls->length () == 0)
+    return NULL;
+
+  auto_vec<tree, 5> args;
+
+  args.quick_push (build_int_cst (integer_type_node, IFN_UNIQUE_OACC_PRIVATE));
+  args.quick_push (integer_zero_node);
+  args.quick_push (integer_minus_one_node);
+
+  FOR_EACH_VEC_ELT (*ctx->oacc_addressable_var_decls, i, decl)
+    {
+      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
+	{
+	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
+	  if (inner_decl)
+	    {
+	      decl = inner_decl;
+	      break;
+	    }
+	}
+      tree addr = build_fold_addr_expr (decl);
+      args.safe_push (addr);
+    }
+
+  return gimple_build_call_internal_vec (IFN_UNIQUE, args);
+}
+
 /* Lower code for an OMP loop directive.  */
 
 static void
@@ -10311,6 +10406,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
+  oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
+
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
   block = make_node (BLOCK);
@@ -10329,6 +10426,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       gbind *inner_bind
 	= as_a <gbind *> (gimple_seq_first_stmt (omp_for_body));
       tree vars = gimple_bind_vars (inner_bind);
+      if (is_gimple_omp_oacc (ctx->stmt))
+	oacc_record_vars_in_bind (ctx, vars);
       gimple_bind_append_vars (new_stmt, vars);
       /* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't
 	 keep them on the inner_bind and it's block.  */
@@ -10428,6 +10527,12 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   lower_omp (gimple_omp_body_ptr (stmt), ctx);
 
+  gcall *private_marker = NULL;
+  if (is_gimple_omp_oacc (ctx->stmt)
+      && !gimple_seq_empty_p (omp_for_body)
+      && !gimple_seq_empty_p (omp_for_body))
+    private_marker = make_oacc_private_marker (ctx);
+
   /* Lower the header expressions.  At this point, we can assume that
      the header is of the form:
 
@@ -10464,7 +10569,7 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (is_gimple_omp_oacc (ctx->stmt)
       && !ctx_in_oacc_kernels_region (ctx))
     lower_oacc_head_tail (gimple_location (stmt),
-			  gimple_omp_for_clauses (stmt),
+			  gimple_omp_for_clauses (stmt), private_marker,
 			  &oacc_head, &oacc_tail, ctx);
 
   /* Add OpenACC partitioning and reduction markers just before the loop.  */
@@ -12289,8 +12394,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	     them as a dummy GANG loop.  */
 	  tree level = build_int_cst (integer_type_node, GOMP_DIM_GANG);
 
+	  gcall *private_marker = make_oacc_private_marker (ctx);
+
+	  if (private_marker)
+	    gimple_call_set_arg (private_marker, 2, level);
+
 	  lower_oacc_reductions (gimple_location (ctx->stmt), clauses, level,
-				 false, NULL, NULL, &fork_seq, &join_seq, ctx);
+				 false, NULL, private_marker, NULL, &fork_seq,
+				 &join_seq, ctx);
 	}
 
       gimple_seq_add_seq (&new_body, fork_seq);
@@ -12546,6 +12657,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		 ctx);
       break;
     case GIMPLE_BIND:
+      if (ctx && is_gimple_omp_oacc (ctx->stmt))
+	oacc_record_vars_in_bind (ctx,
+				  gimple_bind_vars (as_a <gbind *> (stmt)));
       lower_omp (gimple_bind_body_ptr (as_a <gbind *> (stmt)), ctx);
       maybe_remove_omp_member_access_dummy_vars (as_a <gbind *> (stmt));
       break;
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 32eacf7863e..d8291125370 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "cfgloop.h"
+#include "convert.h"
 
 /* Describe the OpenACC looping structure of a function.  The entire
    function is held in a 'NULL' loop.  */
@@ -1082,7 +1083,9 @@ oacc_loop_xform_head_tail (gcall *from, int level)
 	    = ((enum ifn_unique_kind)
 	       TREE_INT_CST_LOW (gimple_call_arg (stmt, 0)));
 
-	  if (k == IFN_UNIQUE_OACC_FORK || k == IFN_UNIQUE_OACC_JOIN)
+	  if (k == IFN_UNIQUE_OACC_FORK
+	      || k == IFN_UNIQUE_OACC_JOIN
+	      || k == IFN_UNIQUE_OACC_PRIVATE)
 	    *gimple_call_arg_ptr (stmt, 2) = replacement;
 	  else if (k == kind && stmt != from)
 	    break;
@@ -1684,6 +1687,38 @@ execute_oacc_device_lower ()
 		case IFN_UNIQUE_OACC_TAIL_MARK:
 		  remove = true;
 		  break;
+
+		case IFN_UNIQUE_OACC_PRIVATE:
+		  {
+		    HOST_WIDE_INT level
+		      = TREE_INT_CST_LOW (gimple_call_arg (call, 2));
+		    if (level == -1)
+		      break;
+		    for (unsigned i = 3;
+			 i < gimple_call_num_args (call);
+			 i++)
+		      {
+			tree arg = gimple_call_arg (call, i);
+			gcc_assert (TREE_CODE (arg) == ADDR_EXPR);
+			tree decl = TREE_OPERAND (arg, 0);
+			if (dump_file && (dump_flags & TDF_DETAILS))
+			  {
+			    static char const *const axes[] =
+			      /* Must be kept in sync with GOMP_DIM
+				 enumeration.  */
+			      { "gang", "worker", "vector" };
+			    fprintf (dump_file, "Decl UID %u has %s "
+				     "partitioning:", DECL_UID (decl),
+				     axes[level]);
+			    print_generic_decl (dump_file, decl, TDF_SLIM);
+			    fputc ('\n', dump_file);
+			  }
+			if (targetm.goacc.adjust_private_decl)
+			  targetm.goacc.adjust_private_decl (decl, level);
+		      }
+		    remove = true;
+		  }
+		  break;
 		}
 	      break;
 	    }
diff --git a/gcc/target.def b/gcc/target.def
index 1f011edf88b..a046d6eddb3 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1719,6 +1719,23 @@ for allocating any storage for reductions when necessary.",
 void, (gcall *call),
 default_goacc_reduction)
 
+DEFHOOK
+(expand_accel_var,
+"This hook, if defined, is used by accelerator target back-ends to expand\n\
+specially handled kinds of VAR_DECL expressions.  A particular use is to\n\
+place variables with specific attributes inside special accelarator\n\
+memories.  A return value of NULL indicates that the target does not\n\
+handle this VAR_DECL, and normal RTL expanding is resumed.",
+rtx, (tree var),
+NULL)
+
+DEFHOOK
+(adjust_private_decl,
+"Tweak variable declaration for a private variable at the specified\n\
+parallelism level.",
+void, (tree var, int),
+NULL)
+
 HOOK_VECTOR_END (goacc)
 
 /* Functions relating to vectorization.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
new file mode 100644
index 00000000000..28222c25da3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-private-1.c
@@ -0,0 +1,38 @@
+#include <assert.h>
+
+int main (void)
+{
+  int ret;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+    int w = 0;
+
+    #pragma acc loop worker
+    for (int i = 0; i < 32; i++)
+      {
+	#pragma acc atomic update
+	w++;
+      }
+
+    ret = (w == 32);
+  }
+  assert (ret);
+
+  #pragma acc parallel num_gangs(1) vector_length(32) copyout(ret)
+  {
+    int v = 0;
+
+    #pragma acc loop vector
+    for (int i = 0; i < 32; i++)
+      {
+	#pragma acc atomic update
+	v++;
+      }
+
+    ret = (v == 32);
+  }
+  assert (ret);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 00000000000..a4f81a39e24
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
new file mode 100644
index 00000000000..b9293e7d2a4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
@@ -0,0 +1,25 @@
+! Test for "oacc gangprivate" attribute on gang-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-oaccdevlow-details" }
+! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has gang partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        !$acc atomic update
+        w = w + 1
+        !$acc end atomic
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
new file mode 100644
index 00000000000..90e06be24ff
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
@@ -0,0 +1,25 @@
+! Test for worker-private variables
+
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-oaccdevlow-details" }
+! { dg-final { scan-tree-dump-times "Decl UID \[0-9\]+ has worker partitioning:  integer\\(kind=4\\) w;" 1 "oaccdevlow" } } */
+
+program main
+  integer :: w, arr(0:31)
+
+  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
+    !$acc loop gang worker private(w)
+    do j = 0, 31
+      w = 0
+      !$acc loop seq
+      do i = 0, 31
+        !$acc atomic update
+        w = w + 1
+        !$acc end atomic
+      end do
+      arr(j) = w
+    end do
+  !$acc end parallel
+
+  if (any (arr .ne. 32)) stop 1
+end program main

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c' (was: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory)
  2018-08-13 20:42     ` Julian Brown
@ 2021-05-19 12:10       ` Thomas Schwinge
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Schwinge @ 2021-05-19 12:10 UTC (permalink / raw)
  To: gcc-patches, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]

Hi!

On 2018-08-13T21:41:50+0100, Julian Brown <julian@codesourcery.com> wrote:
> On Mon, 13 Aug 2018 11:42:26 -0700 Cesar Philippidis <cesar@codesourcery.com> wrote:
>> On 08/13/2018 09:21 AM, Julian Brown wrote:
>> > diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
>> > new file mode 100644
>> > index 0000000..2fa708a
>> > --- /dev/null
>> > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
>> > @@ -0,0 +1,106 @@
>> > +/* { dg-xfail-run-if "gangprivate failure" { openacc_nvidia_accel_selected } { "-O0" } { "" } } */

>> is the above xfail still necessary? It seems to xpass
>> for me on nvptx. However, I see this regression on the host:
>>
>> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-gwv-2.c
>> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1  -O2  execution test

> Oops, this was the version of the patch I meant to post (and the one I
> tested). The XFAIL on loop-gwv-2.c isn't necessary, plus that test
> needed some other fixes to make it pass for NVPTX (it was written for
> GCN to start with).

As I should find out later, this testcase actually does work without the
code changes (OpenACC privatization levels) that it's accompanying -- and
I don't actually see anything in the testcase that the code changes would
trigger for.  Maybe it was for some earlier revision of these code
changes?  Anyway, as it's all-PASS for all systems that I've tested on,
I've now pushed "Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c'" to master
branch in commit 5a16fb19e7c4274f8dd9bbdd30d7d06fe2eff8af, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-libgomp.oacc-c-c-common-loop-gwv-2.c.patch --]
[-- Type: text/x-diff, Size: 3235 bytes --]

From 5a16fb19e7c4274f8dd9bbdd30d7d06fe2eff8af Mon Sep 17 00:00:00 2001
From: Julian Brown <julian@codesourcery.com>
Date: Mon, 13 Aug 2018 21:41:50 +0100
Subject: [PATCH] Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c'

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New.
---
 .../libgomp.oacc-c-c++-common/loop-gwv-2.c    | 95 +++++++++++++++++++
 1 file changed, 95 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
new file mode 100644
index 00000000000..a4f81a39e24
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c
@@ -0,0 +1,95 @@
+#include <stdio.h>
+#include <openacc.h>
+#include <alloca.h>
+#include <string.h>
+#include <gomp-constants.h>
+#include <stdlib.h>
+
+#if 0
+#define DEBUG(DIM, IDX, VAL) \
+  fprintf (stderr, "%sdist[%d] = %d\n", (DIM), (IDX), (VAL))
+#else
+#define DEBUG(DIM, IDX, VAL)
+#endif
+
+#define N (32*32*32)
+
+int
+check (const char *dim, int *dist, int dimsize)
+{
+  int ix;
+  int exit = 0;
+
+  for (ix = 0; ix < dimsize; ix++)
+    {
+      DEBUG(dim, ix, dist[ix]);
+      if (dist[ix] < (N) / (dimsize + 0.5)
+	  || dist[ix] > (N) / (dimsize - 0.5))
+	{
+	  fprintf (stderr, "did not distribute to %ss (%d not between %d "
+		   "and %d)\n", dim, dist[ix], (int) ((N) / (dimsize + 0.5)),
+		   (int) ((N) / (dimsize - 0.5)));
+	  exit |= 1;
+	}
+    }
+
+  return exit;
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int gangsize = 0, workersize = 0, vectorsize = 0;
+  int *gangdist, *workerdist, *vectordist;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copyout(gangsize, workersize, vectorsize)
+  {
+#pragma acc loop gang worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int g, w, v;
+
+	g = __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
+	w = __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
+	v = __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
+
+	ary[ix] = (g << 16) | (w << 8) | v;
+      }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
+  }
+
+  gangdist = (int *) alloca (gangsize * sizeof (int));
+  workerdist = (int *) alloca (workersize * sizeof (int));
+  vectordist = (int *) alloca (vectorsize * sizeof (int));
+  memset (gangdist, 0, gangsize * sizeof (int));
+  memset (workerdist, 0, workersize * sizeof (int));
+  memset (vectordist, 0, vectorsize * sizeof (int));
+
+  /* Test that work is shared approximately equally amongst each active
+     gang/worker/vector.  */
+  for (ix = 0; ix < N; ix++)
+    {
+      int g = (ary[ix] >> 16) & 255;
+      int w = (ary[ix] >> 8) & 255;
+      int v = ary[ix] & 255;
+
+      gangdist[g]++;
+      workerdist[w]++;
+      vectordist[v]++;
+    }
+
+  exit = check ("gang", gangdist, gangsize);
+  exit |= check ("worker", workerdist, workersize);
+  exit |= check ("vector", vectordist, vectorsize);
+
+  return exit;
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-07 14:08                     ` Julian Brown
  2019-06-12 10:23                       ` Jakub Jelinek
  2019-06-12 11:57                       ` Thomas Schwinge
@ 2021-05-21 19:05                       ` Thomas Schwinge
  2 siblings, 0 replies; 26+ messages in thread
From: Thomas Schwinge @ 2021-05-21 19:05 UTC (permalink / raw)
  To: Julian Brown, Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1918 bytes --]

Hi!

On 2019-06-07T15:08:37+0100, Julian Brown <julian@codesourcery.com> wrote:
> Hi Jakub,
>
> Thanks for the review! I believe I've addressed all your comments in
> the attached version of the patch.
>
> On Mon, 3 Jun 2019 18:23:00 +0200
> Jakub Jelinek <jakub@redhat.com> wrote:
>> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
>> > +   is used to mark up variables that should be made private per-gang.  */
>> > +
>> > +static void
>> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
>> > +{
>> > +  [...]
>> > +}
>>
>> You don't want to do this for all GOMP_FOR or GOMP_TARGET context,
>> I'd hope you only want to do that for OpenACC contexts.

> I've [...] fixed the patch to only call oacc_record_private_var_clauses in
> OpenACC contexts.

> commit 6c2a018b940d0b132395048b0600f7d897319ee2
> Author: Julian Brown <julian@codesourcery.com>
> Date:   Thu Aug 9 20:27:04 2018 -0700
>
>     [OpenACC] Add support for gang local storage allocation in shared memory

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> @@ -8599,6 +8681,9 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>
>    push_gimplify_context ();
>
> +  if (is_gimple_omp_oacc (ctx->stmt))
> +    oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
> +
>    lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
>
>    block = make_node (BLOCK);

So, yes -- but then, apparently, that again got lost in a later version
of the patch.  ;-)

I've pushed "[OpenACC privatization] Don't evaluate OpenMP 'for' clauses
[PR90115]" to master branch in commit
3a285ebd0cf5ab762726018515d23280fa6dd445, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-OpenACC-privatization-Don-t-evaluate-OpenMP-for-clau.patch --]
[-- Type: text/x-diff, Size: 925 bytes --]

From 3a285ebd0cf5ab762726018515d23280fa6dd445 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 20 May 2021 15:22:24 +0200
Subject: [PATCH] [OpenACC privatization] Don't evaluate OpenMP 'for' clauses
 [PR90115]

	gcc/
	PR middle-end/90115
	* omp-low.c (lower_omp_for): Don't evaluate OpenMP 'for' clauses.
---
 gcc/omp-low.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index da827ef2e34..a86c6c1e82c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11067,7 +11067,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
-  oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt));
+  if (is_gimple_omp_oacc (ctx->stmt))
+    oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt));
 
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2019-06-03 16:03                 ` Julian Brown
  2019-06-03 16:23                   ` Jakub Jelinek
@ 2022-02-14 15:56                   ` Thomas Schwinge
  2022-02-15 13:40                     ` Julian Brown
  2022-03-10 11:13                     ` Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330] Thomas Schwinge
  1 sibling, 2 replies; 26+ messages in thread
From: Thomas Schwinge @ 2022-02-14 15:56 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, Tom de Vries, Chung-Lin Tang, Jakub Jelinek

Hi Julian!

Two more questions here, in context of <https://gcc.gnu.org/PR102330>
"[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932 since
r12-980-g29a2f51806c":

On 2019-06-03T17:02:45+0100, Julian Brown <julian@codesourcery.com> wrote:
> This is a new version of the patch, rebased

The code as we've now got it in master branch has changed some more, but
I think the behavior I'm seeing may have been introduced here:

> and with a couple of
> additional bugfixes, as follows:
>
> Firstly, in mark_oacc_gangprivate, each decl is looked up (using
> maybe_lookup_decl) to apply the "oacc gangprivate" attribute to the
> innermost-nested copy of the decl.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -137,6 +137,12 @@ struct omp_context

> +  /* Addressable variable decls in this context.  */
> +  vec<tree> *oacc_addressable_var_decls;
>  };

> +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
> +   is used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> +{
> +  tree c;
> +
> +  if (!ctx)
> +    return;
> +
> +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> +      {
> +     tree decl = OMP_CLAUSE_DECL (c);
> +     if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> +       ctx->oacc_addressable_var_decls->safe_push (decl);
> +      }
> +}

So, here we analyze 'OMP_CLAUSE_DECL (c)' (as is, without translation
through 'lookup_decl (decl, ctx)')...

> +/* Record addressable vars declared in BINDVARS in CTX.  This information is
> +   used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> +{
> +  if (!ctx)
> +    return;
> +
> +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> +    if (VAR_P (v) && TREE_ADDRESSABLE (v))
> +      ctx->oacc_addressable_var_decls->safe_push (v);
> +}

..., and similarly here analyze 'v' (without 'lookup_decl (v, ctx)')...

> +/* Mark addressable variables which are declared implicitly or explicitly as
> +   gang private with a special attribute.  These may need to have their
> +   declarations altered later on in compilation (e.g. in
> +   execute_oacc_device_lower or the backend, depending on how the OpenACC
> +   execution model is implemented on a given target) to ensure that sharing
> +   semantics are correct.  */
> +
> +static void
> +mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
> +{
> +  int i;
> +  tree decl;
> +
> +  FOR_EACH_VEC_ELT (*decls, i, decl)
> +    {
> +      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
> +     {
> +       tree inner_decl = maybe_lookup_decl (decl, thisctx);
> +       if (inner_decl)
> +         {
> +           decl = inner_decl;
> +           break;
> +         }
> +     }
> +      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
> +     {
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +         {
> +           fprintf (dump_file,
> +                    "Setting 'oacc gangprivate' attribute for decl:");
> +           print_generic_decl (dump_file, decl, TDF_SLIM);
> +           fputc ('\n', dump_file);
> +         }
> +       DECL_ATTRIBUTES (decl)
> +         = tree_cons (get_identifier ("oacc gangprivate"),
> +                      NULL, DECL_ATTRIBUTES (decl));
> +     }
> +    }
> +}

..., but here we action on the 'maybe_lookup_decl'-translated
'inner_decl', if applicable.  In certain cases that one may be different
from the original 'decl'.  (In particular (only?), when the OMP lowering
has made 'decl' "late 'TREE_ADDRESSABLE'".)  This assymetry I understand
to give rise to <https://gcc.gnu.org/PR102330> "[12 Regression] ICE in
expand_gimple_stmt_1, at cfgexpand.c:3932 since r12-980-g29a2f51806c".

It makes sense to me that we do the OpenACC privatization on the
'lookup_decl' -- but shouldn't we then do that in the analysis phase,
too?  (This appears to work fine for OpenACC 'private' clauses (..., and
avoids marking a few as addressable/gang-private), and for those in
'gimple_bind_vars' it doesn't seem to make a difference (for the current
test cases and/or compiler transformations).)

And, second question: what case did you run into or foresee, that you
here need the 'thisctx' loop and 'maybe_lookup_decl', instead of a plain
'lookup_decl (decl, ctx)'?  Per my testing that's sufficient.

Unless you think this needs more consideration, I suggest to do these two
changes.  (I have a WIP patch in testing.)


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory
  2022-02-14 15:56                   ` Thomas Schwinge
@ 2022-02-15 13:40                     ` Julian Brown
  2022-03-10 11:28                       ` [OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, PR102330, PR104774] Thomas Schwinge
  2022-03-10 11:13                     ` Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330] Thomas Schwinge
  1 sibling, 1 reply; 26+ messages in thread
From: Julian Brown @ 2022-02-15 13:40 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Tom de Vries, Chung-Lin Tang, Jakub Jelinek

On Mon, 14 Feb 2022 16:56:35 +0100
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!
> 
> Two more questions here, in context of <https://gcc.gnu.org/PR102330>
> "[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932
> since r12-980-g29a2f51806c":
> 
> On 2019-06-03T17:02:45+0100, Julian Brown <julian@codesourcery.com>
> wrote:
> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This
> > information
> > +   is used to mark up variables that should be made private
> > per-gang.  */ +
> > +static void
> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> > +{
> > +  tree c;
> > +
> > +  if (!ctx)
> > +    return;
> > +
> > +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> > +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> > +      {
> > +	tree decl = OMP_CLAUSE_DECL (c);
> > +	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> > +	  ctx->oacc_addressable_var_decls->safe_push (decl);
> > +      }
> > +}  
> 
> So, here we analyze 'OMP_CLAUSE_DECL (c)' (as is, without translation
> through 'lookup_decl (decl, ctx)')...

I think you're right that this one should be using lookup_decl, but...

> > +/* Record addressable vars declared in BINDVARS in CTX.  This
> > information is
> > +   used to mark up variables that should be made private per-gang.
> >  */ +
> > +static void
> > +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> > +{
> > +  if (!ctx)
> > +    return;
> > +
> > +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> > +    if (VAR_P (v) && TREE_ADDRESSABLE (v))
> > +      ctx->oacc_addressable_var_decls->safe_push (v);
> > +}  
> 
> ..., and similarly here analyze 'v' (without 'lookup_decl (v,
> ctx)')...

I'm not so sure about this one: if the variables are declared at a
particular binding level, I think they have to be in the current OMP
context (and thus shadow any definitions that might be present in the
parent context)? Maybe that can be confirmed via an assertion.

> > +/* Mark addressable variables which are declared implicitly or
> > explicitly as
> > +   gang private with a special attribute.  These may need to have
> > their
> > +   declarations altered later on in compilation (e.g. in
> > +   execute_oacc_device_lower or the backend, depending on how the
> > OpenACC
> > +   execution model is implemented on a given target) to ensure
> > that sharing
> > +   semantics are correct.  */
> > +
> > +static void
> > +mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
> > +{
> > +  int i;
> > +  tree decl;
> > +
> > +  FOR_EACH_VEC_ELT (*decls, i, decl)
> > +    {
> > +      for (omp_context *thisctx = ctx; thisctx; thisctx =
> > thisctx->outer)
> > +	{
> > +	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
> > +	  if (inner_decl)
> > +	    {
> > +	      decl = inner_decl;
> > +	      break;
> > +	    }
> > +	}
> > +      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES
> > (decl)))
> > +	{
> > +	  if (dump_file && (dump_flags & TDF_DETAILS))
> > +	    {
> > +	      fprintf (dump_file,
> > +		       "Setting 'oacc gangprivate' attribute for
> > decl:");
> > +	      print_generic_decl (dump_file, decl, TDF_SLIM);
> > +	      fputc ('\n', dump_file);
> > +	    }
> > +	  DECL_ATTRIBUTES (decl)
> > +	    = tree_cons (get_identifier ("oacc gangprivate"),
> > +			 NULL, DECL_ATTRIBUTES (decl));
> > +	}
> > +    }
> > +}  
> 
> ..., but here we action on the 'maybe_lookup_decl'-translated
> 'inner_decl', if applicable.  In certain cases that one may be
> different from the original 'decl'.  (In particular (only?), when the
> OMP lowering has made 'decl' "late 'TREE_ADDRESSABLE'".)  This
> assymetry I understand to give rise to <https://gcc.gnu.org/PR102330>
> "[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932
> since r12-980-g29a2f51806c".
> 
> It makes sense to me that we do the OpenACC privatization on the
> 'lookup_decl' -- but shouldn't we then do that in the analysis phase,
> too?  (This appears to work fine for OpenACC 'private' clauses (...,
> and avoids marking a few as addressable/gang-private), and for those
> in 'gimple_bind_vars' it doesn't seem to make a difference (for the
> current test cases and/or compiler transformations).)

Yes, I think you're right.

> And, second question: what case did you run into or foresee, that you
> here need the 'thisctx' loop and 'maybe_lookup_decl', instead of a
> plain 'lookup_decl (decl, ctx)'?  Per my testing that's sufficient.

I'd probably misunderstood about lookup_decl walking up through parent
contexts itself... oops.

> Unless you think this needs more consideration, I suggest to do these
> two changes.  (I have a WIP patch in testing.)

Sounds good to me.

Thank you,

Julian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330]
  2022-02-14 15:56                   ` Thomas Schwinge
  2022-02-15 13:40                     ` Julian Brown
@ 2022-03-10 11:13                     ` Thomas Schwinge
  2022-03-10 11:18                       ` Add 'c-c++-common/goacc/kernels-decompose-pr104774-1.c' [PR104774] Thomas Schwinge
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Schwinge @ 2022-03-10 11:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, Tom de Vries, Chung-Lin Tang, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

Hi!

On 2022-02-14T16:56:35+0100, I wrote:
> [...] give rise to <https://gcc.gnu.org/PR102330> "[12 Regression] ICE in
> expand_gimple_stmt_1, at cfgexpand.c:3932 since r12-980-g29a2f51806c".

Pushed to master branch commit 687091257820f4a6a005186437917270ecd27416
"Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330]", see
attached: currently XFAILed with 'dg-ice'.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-gfortran.dg-goacc-gomp-pr102330-1-2-3-.f90-PR102.patch --]
[-- Type: text/x-diff, Size: 4113 bytes --]

From 687091257820f4a6a005186437917270ecd27416 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 27 Jan 2022 14:17:28 +0100
Subject: [PATCH] Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330]

..., currently XFAILed with 'dg-ice'.

	PR middle-end/102330
	gcc/testsuite/
	* gfortran.dg/goacc-gomp/pr102330-1.f90: New file.
	* gfortran.dg/goacc-gomp/pr102330-2.f90: Likewise.
	* gfortran.dg/goacc-gomp/pr102330-3.f90: Likewise.
---
 .../gfortran.dg/goacc-gomp/pr102330-1.f90     | 20 +++++++++++++++++
 .../gfortran.dg/goacc-gomp/pr102330-2.f90     | 20 +++++++++++++++++
 .../gfortran.dg/goacc-gomp/pr102330-3.f90     | 22 +++++++++++++++++++
 3 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90

diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
new file mode 100644
index 00000000000..fba8c718dc2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
@@ -0,0 +1,20 @@
+! { dg-additional-options -fchecking }
+! { dg-ice TODO }
+
+! { dg-additional-options -fopt-info-omp-note }
+
+! { dg-additional-options --param=openacc-privatization=noisy }
+
+program p
+  !$omp master taskloop simd
+  do i = 1, 8
+  end do
+  !$acc parallel loop ! { dg-line l_compute1 }
+  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
+  do i = 1, 8
+  end do
+end
+! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
+! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
+! TODO See PR101551 for 'offloading_enabled' differences.
+! { dg-excess-errors ICE }
diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
new file mode 100644
index 00000000000..7a1ce8b088c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
@@ -0,0 +1,20 @@
+! { dg-additional-options -fchecking }
+! { dg-ice TODO }
+
+! { dg-additional-options -fopt-info-omp-note }
+
+! { dg-additional-options --param=openacc-privatization=noisy }
+
+program p
+  !$omp taskloop lastprivate(i)
+  do i = 1, 8
+  end do
+  !$acc parallel loop ! { dg-line l_compute1 }
+  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
+  do i = 1, 8
+  end do
+end
+! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
+! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
+! TODO See PR101551 for 'offloading_enabled' differences.
+! { dg-excess-errors ICE }
diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90
new file mode 100644
index 00000000000..b8b1479c7ea
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90
@@ -0,0 +1,22 @@
+! { dg-additional-options -fchecking }
+! { dg-ice TODO }
+
+! { dg-additional-options -fopt-info-omp-note }
+
+! { dg-additional-options --param=openacc-privatization=noisy }
+
+program p
+  i = 0
+  !$omp task shared(i)
+  i = 1
+  !$omp end task
+  !$omp taskwait
+  !$acc parallel loop ! { dg-line l_compute1 }
+  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
+  do i = 1, 8
+  end do
+end
+! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
+! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
+! TODO See PR101551 for 'offloading_enabled' differences.
+! { dg-excess-errors ICE }
-- 
2.34.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Add 'c-c++-common/goacc/kernels-decompose-pr104774-1.c' [PR104774]
  2022-03-10 11:13                     ` Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330] Thomas Schwinge
@ 2022-03-10 11:18                       ` Thomas Schwinge
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Schwinge @ 2022-03-10 11:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: Julian Brown, Tom de Vries, Chung-Lin Tang, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

Hi!

On 2022-03-10T12:13:29+0100, I wrote:
> On 2022-02-14T16:56:35+0100, I wrote:
>> [...] give rise to <https://gcc.gnu.org/PR102330> "[12 Regression] ICE in
>> expand_gimple_stmt_1, at cfgexpand.c:3932 since r12-980-g29a2f51806c".
>
> Pushed to master branch commit 687091257820f4a6a005186437917270ecd27416
> "Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330]", see
> attached: currently XFAILed with 'dg-ice'.

Well, and as I should figure out, the very same problem/fix is what
causes/cures recently-filed <https://gcc.gnu.org/PR104774> "OpenACC
'kernels' decomposition: internal compiler error: 'verify_gimple' failed,
with 'loop' with explicit 'seq' or 'independent'"!

Pushed to master branch commit 448741533a75862ebf51d8e73eb1dd1f6a47eec5
"Add 'c-c++-common/goacc/kernels-decompose-pr104774-1.c' [PR104774]", see
attached: currently XFAILed with 'dg-ice'.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-c-c-common-goacc-kernels-decompose-pr104774-1.c-.patch --]
[-- Type: text/x-diff, Size: 3025 bytes --]

From 448741533a75862ebf51d8e73eb1dd1f6a47eec5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 3 Mar 2022 18:00:52 +0100
Subject: [PATCH] Add 'c-c++-common/goacc/kernels-decompose-pr104774-1.c'
 [PR104774]

..., currently XFAILed with 'dg-ice'.

	PR middle-end/104774
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104774-1.c: New file.
---
 .../goacc/kernels-decompose-pr104774-1.c      | 41 +++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c
new file mode 100644
index 00000000000..776f4d6befa
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c
@@ -0,0 +1,41 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+#pragma acc kernels /* { dg-line l_compute1 } */
+  /* { dg-note {OpenACC 'kernels' decomposition: variable 'k' declared in block requested to be made addressable} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'k' made addressable} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute1 } */
+  {
+    int k;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc loop seq /* { dg-line l_loop_k1 } */
+    /* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k1 } */
+    for (k = 0; k < 2; k++)
+      arr_0 = k;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc loop independent reduction(+: arr_0) /* { dg-line l_loop_k2 } */
+    /* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k2 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+  }
+}
+/* { dg-bogus {error: non-register as LHS of binary operation} {} { xfail *-*-* } .-1 }
+   { dg-bogus {error: invalid RHS for gimple memory store: 'var_decl'} {} { xfail *-*-* } .-2 }
+   { dg-allow-blank-lines-in-output 1 }
+   { dg-excess-errors ICE } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, PR102330, PR104774]
  2022-02-15 13:40                     ` Julian Brown
@ 2022-03-10 11:28                       ` Thomas Schwinge
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Schwinge @ 2022-03-10 11:28 UTC (permalink / raw)
  To: gcc-patches, Julian Brown; +Cc: Tom de Vries, Chung-Lin Tang, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 5612 bytes --]

Hi!

On 2022-02-15T13:40:09+0000, Julian Brown <julian@codesourcery.com> wrote:
> On Mon, 14 Feb 2022 16:56:35 +0100
> Thomas Schwinge <thomas@codesourcery.com> wrote:
>> Two more questions here, in context of <https://gcc.gnu.org/PR102330>
>> "[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932
>> since r12-980-g29a2f51806c":
>>
>> On 2019-06-03T17:02:45+0100, Julian Brown <julian@codesourcery.com> wrote:
>> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
>> > +   is used to mark up variables that should be made private per-gang.  */
>> > +
>> > +static void
>> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
>> > +{
>> > +  tree c;
>> > +
>> > +  if (!ctx)
>> > +    return;
>> > +
>> > +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>> > +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
>> > +      {
>> > +  tree decl = OMP_CLAUSE_DECL (c);
>> > +  if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
>> > +    ctx->oacc_addressable_var_decls->safe_push (decl);
>> > +      }
>> > +}
>>
>> So, here we analyze 'OMP_CLAUSE_DECL (c)' (as is, without translation
>> through 'lookup_decl (decl, ctx)')...
>
> I think you're right that this one should be using lookup_decl, but...
>
>> > +/* Record addressable vars declared in BINDVARS in CTX.  This information is
>> > +   used to mark up variables that should be made private per-gang.  */
>> > +
>> > +static void
>> > +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
>> > +{
>> > +  if (!ctx)
>> > +    return;
>> > +
>> > +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
>> > +    if (VAR_P (v) && TREE_ADDRESSABLE (v))
>> > +      ctx->oacc_addressable_var_decls->safe_push (v);
>> > +}
>>
>> ..., and similarly here analyze 'v' (without 'lookup_decl (v, ctx)')...
>
> I'm not so sure about this one: if the variables are declared at a
> particular binding level, I think they have to be in the current OMP
> context (and thus shadow any definitions that might be present in the
> parent context)? Maybe that can be confirmed via an assertion.

Yes, I've added an 'gcc_checking_assert (lookup_decl (v, ctx) == v);'.

>> > +/* Mark addressable variables which are declared implicitly or explicitly as
>> > +   gang private with a special attribute.  These may need to have their
>> > +   declarations altered later on in compilation (e.g. in
>> > +   execute_oacc_device_lower or the backend, depending on how the OpenACC
>> > +   execution model is implemented on a given target) to ensure that sharing
>> > +   semantics are correct.  */
>> > +
>> > +static void
>> > +mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
>> > +{
>> > +  int i;
>> > +  tree decl;
>> > +
>> > +  FOR_EACH_VEC_ELT (*decls, i, decl)
>> > +    {
>> > +      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
>> > +  {
>> > +    tree inner_decl = maybe_lookup_decl (decl, thisctx);
>> > +    if (inner_decl)
>> > +      {
>> > +        decl = inner_decl;
>> > +        break;
>> > +      }
>> > +  }
>> > +      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
>> > +  {
>> > +    if (dump_file && (dump_flags & TDF_DETAILS))
>> > +      {
>> > +        fprintf (dump_file,
>> > +                 "Setting 'oacc gangprivate' attribute for decl:");
>> > +        print_generic_decl (dump_file, decl, TDF_SLIM);
>> > +        fputc ('\n', dump_file);
>> > +      }
>> > +    DECL_ATTRIBUTES (decl)
>> > +      = tree_cons (get_identifier ("oacc gangprivate"),
>> > +                   NULL, DECL_ATTRIBUTES (decl));
>> > +  }
>> > +    }
>> > +}
>>
>> ..., but here we action on the 'maybe_lookup_decl'-translated
>> 'inner_decl', if applicable.  In certain cases that one may be
>> different from the original 'decl'.  (In particular (only?), when the
>> OMP lowering has made 'decl' "late 'TREE_ADDRESSABLE'".)  This
>> assymetry I understand to give rise to <https://gcc.gnu.org/PR102330>
>> "[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932
>> since r12-980-g29a2f51806c".
>>
>> It makes sense to me that we do the OpenACC privatization on the
>> 'lookup_decl' -- but shouldn't we then do that in the analysis phase,
>> too?  (This appears to work fine for OpenACC 'private' clauses (...,
>> and avoids marking a few as addressable/gang-private), and for those
>> in 'gimple_bind_vars' it doesn't seem to make a difference (for the
>> current test cases and/or compiler transformations).)
>
> Yes, I think you're right.
>
>> And, second question: what case did you run into or foresee, that you
>> here need the 'thisctx' loop and 'maybe_lookup_decl', instead of a
>> plain 'lookup_decl (decl, ctx)'?  Per my testing that's sufficient.
>
> I'd probably misunderstood about lookup_decl walking up through parent
> contexts itself... oops.
>
>> Unless you think this needs more consideration, I suggest to do these
>> two changes.  (I have a WIP patch in testing.)
>
> Sounds good to me.

Thanks for your conceptual review.  Pushed to master branch
commit 7a5e036b61aa088e6b8564bc9383d37dfbb4801e "[OpenACC privatization]
Analyze 'lookup_decl'-translated DECL [PR90115, PR102330, PR104774]", see
attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-OpenACC-privatization-Analyze-lookup_decl-translated.patch --]
[-- Type: text/x-diff, Size: 79750 bytes --]

From 7a5e036b61aa088e6b8564bc9383d37dfbb4801e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 14 Feb 2022 16:56:35 +0100
Subject: [PATCH] [OpenACC privatization] Analyze 'lookup_decl'-translated DECL
 [PR90115, PR102330, PR104774]

... so that it matches what we analyze and what we action on.
Fix-up for commit 29a2f51806c5b30e17a8d0e9ba7915a3c53c34ff "openacc:
Add support for gang local storage allocation in shared memory [PR90115]".

	PR middle-end/90115
	PR middle-end/102330
	PR middle-end/104774
	gcc/
	* omp-low.cc (oacc_privatization_candidate_p)
	(oacc_privatization_scan_clause_chain)
	(oacc_privatization_scan_decl_chain, lower_oacc_private_marker):
	Analyze 'lookup_decl'-translated DECL.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
	* c-c++-common/goacc/privatization-1-compute-loop.c: Likewise.
	* c-c++-common/goacc/privatization-1-compute.c: Likewise.
	* c-c++-common/goacc/privatization-1-routine_gang-loop.c:
	Likewise.
	* c-c++-common/goacc/privatization-1-routine_gang.c: Likewise.
	* gfortran.dg/goacc-gomp/pr102330-1.f90: Likewise, and subsume...
	* gfortran.dg/goacc-gomp/pr102330-2.f90: ... this file, and...
	* gfortran.dg/goacc-gomp/pr102330-3.f90: ... this file.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
	Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
---
 gcc/omp-low.cc                                | 37 +++++----
 .../goacc/kernels-decompose-pr104061-1-3.c    |  3 +-
 .../goacc/kernels-decompose-pr104061-1-4.c    |  3 +-
 .../goacc/kernels-decompose-pr104132-1.c      |  4 +-
 .../goacc/kernels-decompose-pr104133-1.c      |  4 +-
 .../goacc/kernels-decompose-pr104774-1.c      | 13 +--
 .../goacc/privatization-1-compute-loop.c      |  9 ++-
 .../goacc/privatization-1-compute.c           |  9 ++-
 .../goacc/privatization-1-routine_gang-loop.c |  9 ++-
 .../goacc/privatization-1-routine_gang.c      |  9 ++-
 .../gfortran.dg/goacc-gomp/pr102330-1.f90     | 33 +++++---
 .../gfortran.dg/goacc-gomp/pr102330-2.f90     | 20 -----
 .../gfortran.dg/goacc-gomp/pr102330-3.f90     | 22 -----
 .../goacc/privatization-1-compute-loop.f90    |  6 +-
 .../goacc/privatization-1-compute.f90         |  6 +-
 .../privatization-1-routine_gang-loop.f90     |  6 +-
 .../goacc/privatization-1-routine_gang.f90    |  6 +-
 .../kernels-decompose-1.c                     | 81 ++++++++++++++-----
 .../kernels-private-vars-local-worker-1.c     |  6 +-
 .../kernels-private-vars-local-worker-2.c     |  3 +-
 .../kernels-private-vars-local-worker-3.c     |  3 +-
 .../kernels-private-vars-local-worker-4.c     |  3 +-
 .../kernels-private-vars-local-worker-5.c     |  3 +-
 .../kernels-private-vars-loop-vector-1.c      |  3 +-
 .../kernels-private-vars-loop-vector-2.c      |  3 +-
 .../kernels-private-vars-loop-worker-2.c      |  3 +-
 .../kernels-private-vars-loop-worker-3.c      |  6 +-
 .../kernels-private-vars-loop-worker-4.c      |  3 +-
 .../kernels-private-vars-loop-worker-5.c      |  3 +-
 .../kernels-private-vars-loop-worker-6.c      |  3 +-
 .../kernels-private-vars-loop-worker-7.c      |  3 +-
 .../libgomp.oacc-fortran/optional-private.f90 |  6 +-
 .../libgomp.oacc-fortran/privatized-ref-1.f95 |  8 +-
 .../libgomp.oacc-fortran/privatized-ref-2.f90 | 18 +++--
 34 files changed, 187 insertions(+), 170 deletions(-)
 delete mode 100644 gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
 delete mode 100644 gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 5ce3a50709a..d932d74cb03 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -10590,6 +10590,10 @@ oacc_privatization_candidate_p (const location_t loc, const tree c,
 
   if (res && !VAR_P (decl))
     {
+      /* A PARM_DECL (appearing in a 'private' clause) is expected to have been
+	 privatized into a new VAR_DECL.  */
+      gcc_checking_assert (TREE_CODE (decl) != PARM_DECL);
+
       res = false;
 
       if (dump_enabled_p ())
@@ -10670,11 +10674,15 @@ oacc_privatization_scan_clause_chain (omp_context *ctx, tree clauses)
       {
 	tree decl = OMP_CLAUSE_DECL (c);
 
-	if (!oacc_privatization_candidate_p (OMP_CLAUSE_LOCATION (c), c, decl))
+	tree new_decl = lookup_decl (decl, ctx);
+
+	if (!oacc_privatization_candidate_p (OMP_CLAUSE_LOCATION (c), c,
+					     new_decl))
 	  continue;
 
-	gcc_checking_assert (!ctx->oacc_privatization_candidates.contains (decl));
-	ctx->oacc_privatization_candidates.safe_push (decl);
+	gcc_checking_assert
+	  (!ctx->oacc_privatization_candidates.contains (new_decl));
+	ctx->oacc_privatization_candidates.safe_push (new_decl);
       }
 }
 
@@ -10686,11 +10694,16 @@ oacc_privatization_scan_decl_chain (omp_context *ctx, tree decls)
 {
   for (tree decl = decls; decl; decl = DECL_CHAIN (decl))
     {
-      if (!oacc_privatization_candidate_p (gimple_location (ctx->stmt), NULL, decl))
+      tree new_decl = lookup_decl (decl, ctx);
+      gcc_checking_assert (new_decl == decl);
+
+      if (!oacc_privatization_candidate_p (gimple_location (ctx->stmt), NULL,
+					   new_decl))
 	continue;
 
-      gcc_checking_assert (!ctx->oacc_privatization_candidates.contains (decl));
-      ctx->oacc_privatization_candidates.safe_push (decl);
+      gcc_checking_assert
+	(!ctx->oacc_privatization_candidates.contains (new_decl));
+      ctx->oacc_privatization_candidates.safe_push (new_decl);
     }
 }
 
@@ -11557,17 +11570,7 @@ lower_oacc_private_marker (omp_context *ctx)
   tree decl;
   FOR_EACH_VEC_ELT (ctx->oacc_privatization_candidates, i, decl)
     {
-      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
-	{
-	  tree inner_decl = maybe_lookup_decl (decl, thisctx);
-	  if (inner_decl)
-	    {
-	      decl = inner_decl;
-	      break;
-	    }
-	}
-      gcc_checking_assert (decl);
-
+      gcc_checking_assert (TREE_ADDRESSABLE (decl));
       tree addr = build_fold_addr_expr (decl);
       args.safe_push (addr);
     }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
index e106fc32c4f..28d26e566f6 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
@@ -29,8 +29,7 @@ foo (void)
     /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_k1 } */
     /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k1 } */
-    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {w/o debug} { target *-*-* } l_loop_k1 }
-       { dg-bogus {note: variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } l_loop_k1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k1 }
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_k1 } */
     for (k = 0; k < 2; k++)
       arr_0 += k;
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c
index bedbb0a30eb..4d125b5db87 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c
@@ -29,8 +29,7 @@ foo (void)
     /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_k1 } */
     /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k1 } */
-    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {w/o debug} { target *-*-* } l_loop_k1 }
-       { dg-bogus {note: variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } l_loop_k1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k1 }
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_k1 } */
     for (k = 0; k < 2; k++)
       arr_0 += k;
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104132-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104132-1.c
index 42ec4418e40..36a43ca6d1a 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104132-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104132-1.c
@@ -20,14 +20,14 @@ foo (void)
 
     /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_k1 } */
-    /* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k1 } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_k1 } */
     for (k = 0; k < 2; k++)
       arr_0 = k;
 
     /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_k2 } */
-    /* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k2 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k2 } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_k2 } */
     for (k = 0; k < 2; k++)
       arr_0 = k;
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c
index 47ea2b92959..d9da9dae14c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c
@@ -22,14 +22,14 @@ foo (void)
 
     /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_k1 } */
-    /* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k1 } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_k1 } */
     for (k = 0; k < 2; k++)
       arr_0 += k;
 
     /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_k2 } */
-    /* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k2 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k2 } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_k2 } */
     for (k = 0; k < 2; k++)
       arr_0 += k;
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c
index 776f4d6befa..42faa48f991 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c
@@ -1,8 +1,5 @@
 /* { dg-additional-options "--param openacc-kernels=decompose" } */
 
-/* { dg-additional-options "-fchecking" }
-   { dg-ice TODO } */
-
 /* { dg-additional-options "-fopt-info-all-omp" } */
 
 /* { dg-additional-options "--param=openacc-privatization=noisy" }
@@ -24,18 +21,16 @@ foo (void)
 
     /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
 #pragma acc loop seq /* { dg-line l_loop_k1 } */
-    /* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k1 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_k1 } */
     for (k = 0; k < 2; k++)
       arr_0 = k;
 
     /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
 #pragma acc loop independent reduction(+: arr_0) /* { dg-line l_loop_k2 } */
-    /* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k2 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k2 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { target *-*-* } l_loop_k2 } */
     for (k = 0; k < 2; k++)
       arr_0 += k;
   }
 }
-/* { dg-bogus {error: non-register as LHS of binary operation} {} { xfail *-*-* } .-1 }
-   { dg-bogus {error: invalid RHS for gimple memory store: 'var_decl'} {} { xfail *-*-* } .-2 }
-   { dg-allow-blank-lines-in-output 1 }
-   { dg-excess-errors ICE } */
diff --git a/gcc/testsuite/c-c++-common/goacc/privatization-1-compute-loop.c b/gcc/testsuite/c-c++-common/goacc/privatization-1-compute-loop.c
index 43b39c2042f..52d5598c28b 100644
--- a/gcc/testsuite/c-c++-common/goacc/privatization-1-compute-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/privatization-1-compute-loop.c
@@ -74,11 +74,14 @@ f (int i, int j, int a)
      { dg-note {variable 's' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'e' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
      { dg-note {variable 'e' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
-  /* { dg-note {variable 'a' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop } */
+  /* { dg-note {variable 'a' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+     { dg-note {variable 'a' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'j\.1' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } */
-  /* { dg-note {variable 'j' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop } */
-  /* { dg-note {variable 'i' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop } */
+  /* { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+     { dg-note {variable 'j' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
+  /* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+     { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'll' declared in block potentially has improper OpenACC privatization level: 'label_decl'} "TODO" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'struct struct s_ss' declared in block potentially has improper OpenACC privatization level: 'type_decl'} "TODO" { target c } l_loop$c_loop }
      { dg-note {variable 's_ss' declared in block potentially has improper OpenACC privatization level: 'type_decl'} "TODO" { target c++ } l_loop$c_loop } */
diff --git a/gcc/testsuite/c-c++-common/goacc/privatization-1-compute.c b/gcc/testsuite/c-c++-common/goacc/privatization-1-compute.c
index b7c7bff64d9..ae0d3ab8498 100644
--- a/gcc/testsuite/c-c++-common/goacc/privatization-1-compute.c
+++ b/gcc/testsuite/c-c++-common/goacc/privatization-1-compute.c
@@ -71,9 +71,12 @@ f (int i, int j, int a)
      { dg-note {variable 's' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_compute$c_compute } */
   /* { dg-note {variable 'e' in 'private' clause is candidate for adjusting OpenACC privatization level} "TODO" { xfail *-*-* } l_compute$c_compute }
      { dg-note {variable 'e' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_compute$c_compute } */
-  /* { dg-note {variable 'a' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_compute$c_compute } */
-  /* { dg-note {variable 'j' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_compute$c_compute } */
-  /* { dg-note {variable 'i' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'a' in 'private' clause is candidate for adjusting OpenACC privatization level} "TODO" { xfail *-*-* } l_compute$c_compute }
+     { dg-note {variable 'a' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} "TODO" { xfail *-*-* } l_compute$c_compute }
+     { dg-note {variable 'j' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} "TODO" { xfail *-*-* } l_compute$c_compute }
+     { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_compute$c_compute } */
   /* { dg-note {variable 'll' declared in block potentially has improper OpenACC privatization level: 'label_decl'} "TODO" { target *-*-* } l_compute$c_compute } */
   /* { dg-note {variable 'struct struct s_ss' declared in block potentially has improper OpenACC privatization level: 'type_decl'} "TODO" { target c } l_compute$c_compute }
      { dg-note {variable 's_ss' declared in block potentially has improper OpenACC privatization level: 'type_decl'} "TODO" { target c++ } l_compute$c_compute } */
diff --git a/gcc/testsuite/c-c++-common/goacc/privatization-1-routine_gang-loop.c b/gcc/testsuite/c-c++-common/goacc/privatization-1-routine_gang-loop.c
index 816e4306437..d394d058bf7 100644
--- a/gcc/testsuite/c-c++-common/goacc/privatization-1-routine_gang-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/privatization-1-routine_gang-loop.c
@@ -74,11 +74,14 @@ f (int i, int j, int a)
      { dg-note {variable 's' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'e' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
      { dg-note {variable 'e' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
-  /* { dg-note {variable 'a' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop } */
+  /* { dg-note {variable 'a' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+     { dg-note {variable 'a' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'j\.1' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } */
-  /* { dg-note {variable 'j' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop } */
-  /* { dg-note {variable 'i' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop } */
+  /* { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+     { dg-note {variable 'j' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
+  /* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+     { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'll' declared in block potentially has improper OpenACC privatization level: 'label_decl'} "TODO" { target *-*-* } l_loop$c_loop } */
   /* { dg-note {variable 'struct struct s_ss' declared in block potentially has improper OpenACC privatization level: 'type_decl'} "TODO" { target c } l_loop$c_loop }
      { dg-note {variable 's_ss' declared in block potentially has improper OpenACC privatization level: 'type_decl'} "TODO" { target c++ } l_loop$c_loop } */
diff --git a/gcc/testsuite/c-c++-common/goacc/privatization-1-routine_gang.c b/gcc/testsuite/c-c++-common/goacc/privatization-1-routine_gang.c
index f9f316e4ff9..1aef803b9a7 100644
--- a/gcc/testsuite/c-c++-common/goacc/privatization-1-routine_gang.c
+++ b/gcc/testsuite/c-c++-common/goacc/privatization-1-routine_gang.c
@@ -75,9 +75,12 @@ f (int i, int j, int a)
      { dg-note {variable 's' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_routine$c_routine } */
   /* { dg-note {variable 'e' declared in block is candidate for adjusting OpenACC privatization level} "TODO" { xfail *-*-* } l_routine$c_routine }
      { dg-note {variable 'e' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_routine$c_routine } */
-  /* { dg-note {variable 'a' declared in block potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_routine$c_routine } */
-  /* { dg-note {variable 'j' declared in block potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_routine$c_routine } */
-  /* { dg-note {variable 'i' declared in block potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_routine$c_routine } */
+  /* { dg-note {variable 'a' declared in block is candidate for adjusting OpenACC privatization level} "TODO" { xfail *-*-* } l_routine$c_routine }
+     { dg-note {variable 'a' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_routine$c_routine } */
+  /* { dg-note {variable 'j' declared in block is candidate for adjusting OpenACC privatization level} "TODO" { xfail *-*-* } l_routine$c_routine }
+     { dg-note {variable 'j' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_routine$c_routine } */
+  /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "TODO" { xfail *-*-* } l_routine$c_routine }
+     { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'gang'} "TODO" { xfail *-*-* } l_routine$c_routine } */
   /* { dg-note {variable 'll' declared in block potentially has improper OpenACC privatization level: 'label_decl'} "TODO" { xfail *-*-* } l_routine$c_routine } */
   /* { dg-note {variable 'struct struct s_ss' declared in block potentially has improper OpenACC privatization level: 'type_decl'} "TODO" { target c xfail *-*-* } l_routine$c_routine }
      { dg-note {variable 's_ss' declared in block potentially has improper OpenACC privatization level: 'type_decl'} "TODO" { target c++ xfail *-*-* } l_routine$c_routine } */
diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
index fba8c718dc2..025bcbf881e 100644
--- a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
+++ b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
@@ -1,20 +1,35 @@
-! { dg-additional-options -fchecking }
-! { dg-ice TODO }
-
 ! { dg-additional-options -fopt-info-omp-note }
 
 ! { dg-additional-options --param=openacc-privatization=noisy }
 
-program p
+subroutine r1
   !$omp master taskloop simd
   do i = 1, 8
   end do
   !$acc parallel loop ! { dg-line l_compute1 }
-  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute1 }
+  do i = 1, 8
+  end do
+end
+
+subroutine r2
+  !$omp taskloop lastprivate(i)
+  do i = 1, 8
+  end do
+  !$acc parallel loop ! { dg-line l_compute2 }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute2 }
+  do i = 1, 8
+  end do
+end
+
+subroutine r3
+  i = 0
+  !$omp task shared(i)
+  i = 1
+  !$omp end task
+  !$omp taskwait
+  !$acc parallel loop ! { dg-line l_compute3 }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute3 }
   do i = 1, 8
   end do
 end
-! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
-! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
-! TODO See PR101551 for 'offloading_enabled' differences.
-! { dg-excess-errors ICE }
diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
deleted file mode 100644
index 7a1ce8b088c..00000000000
--- a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
+++ /dev/null
@@ -1,20 +0,0 @@
-! { dg-additional-options -fchecking }
-! { dg-ice TODO }
-
-! { dg-additional-options -fopt-info-omp-note }
-
-! { dg-additional-options --param=openacc-privatization=noisy }
-
-program p
-  !$omp taskloop lastprivate(i)
-  do i = 1, 8
-  end do
-  !$acc parallel loop ! { dg-line l_compute1 }
-  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
-  do i = 1, 8
-  end do
-end
-! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
-! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
-! TODO See PR101551 for 'offloading_enabled' differences.
-! { dg-excess-errors ICE }
diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90
deleted file mode 100644
index b8b1479c7ea..00000000000
--- a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90
+++ /dev/null
@@ -1,22 +0,0 @@
-! { dg-additional-options -fchecking }
-! { dg-ice TODO }
-
-! { dg-additional-options -fopt-info-omp-note }
-
-! { dg-additional-options --param=openacc-privatization=noisy }
-
-program p
-  i = 0
-  !$omp task shared(i)
-  i = 1
-  !$omp end task
-  !$omp taskwait
-  !$acc parallel loop ! { dg-line l_compute1 }
-  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
-  do i = 1, 8
-  end do
-end
-! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
-! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
-! TODO See PR101551 for 'offloading_enabled' differences.
-! { dg-excess-errors ICE }
diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
index c825a958e9b..4dfeb7e07a2 100644
--- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90
@@ -47,9 +47,9 @@ contains
        end do
     end do
     ! { dg-note {variable 'count\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'i' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'j' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'a' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'a' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'x' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'y' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_loop$c_loop }
diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
index a88203e48d5..68d084dd492 100644
--- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
@@ -47,9 +47,9 @@ contains
           !$acc atomic write ! ... to force 'TREE_ADDRESSABLE'.
           y = a
     !$acc end parallel
-    ! { dg-note {variable 'i' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO2" { xfail *-*-* } l_compute$c_compute }
-    ! { dg-note {variable 'j' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO3" { xfail *-*-* } l_compute$c_compute }
-    ! { dg-note {variable 'a' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO4" { xfail *-*-* } l_compute$c_compute }
+    ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "TODO" { xfail *-*-* } l_compute$c_compute }
+    ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "TODO" { xfail *-*-* } l_compute$c_compute }
+    ! { dg-note {variable 'a' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "TODO" { xfail *-*-* } l_compute$c_compute }
     ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
   end subroutine f
 end module m
diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang-loop.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang-loop.f90
index 74c740f0493..6878d856919 100644
--- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang-loop.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang-loop.f90
@@ -47,9 +47,9 @@ contains
        end do
     end do
     ! { dg-note {variable 'count\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'i' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'j' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'a' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'a' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'x' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'y' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_loop$c_loop }
diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang.f90
index 59bd43e4070..2bde97db6f7 100644
--- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-routine_gang.f90
@@ -41,8 +41,8 @@ contains
           !$acc atomic write ! ... to force 'TREE_ADDRESSABLE'.
           y = a
   end subroutine f
-    ! { dg-note {variable 'i' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_routine$c_routine }
-    ! { dg-note {variable 'j' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_routine$c_routine }
-    ! { dg-note {variable 'a' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } l_routine$c_routine }
+    ! { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "TODO" { xfail *-*-* } l_routine$c_routine }
+    ! { dg-note {variable 'j' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "TODO" { xfail *-*-* } l_routine$c_routine }
+    ! { dg-note {variable 'a' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "TODO" { xfail *-*-* } l_routine$c_routine }
     ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { xfail *-*-* } l_routine$c_routine }
 end module m
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index 985a547d381..40786c750d1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -36,15 +36,12 @@ int main()
   (volatile void *) &f1;
 
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
-  /* { dg-note {variable 'g2\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
-  /* { dg-note {variable 'f1\.1' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
-  /* { dg-note {variable 'f1\.2' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   {
     /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
     int c = 234;
-    /* { dg-note {OpenACC 'kernels' decomposition: variable 'c' declared in block requested to be made addressable} "" { target *-*-* } l_compute$c_compute } */
-    /* { dg-note {variable 'c' made addressable} {} { target *-*-* } l_compute$c_compute } */
-    /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'c' declared in block requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'c' made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
 
 #pragma acc loop independent gang /* { dg-line l_loop_i[incr c_loop_i] } */
     /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i } */
@@ -57,41 +54,89 @@ int main()
     /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
     a = c;
 
-    /* PR104132, PR104133 */
+    /* PR104132, PR104133, PR104774 */
     {
       /* Use the 'kernels'-top-level 'int c' as loop variable.  */
 
-      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_c[incr c_loop_c] } */
-      /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_c$c_loop_c } */
+      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_c$c_loop_c } */
+      /* { dg-note {variable 'c' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_c$c_loop_c } */
       /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_c$c_loop_c } */
       for (c = 0; c < N / 2; c++)
 	b[c] -= 10;
 
-      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_c[incr c_loop_c] } */
-      /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_c$c_loop_c } */
+      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_c$c_loop_c } */
+      /* { dg-note {variable 'c' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_c$c_loop_c } */
       /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_c$c_loop_c } */
       for (c = 0; c < N / 2; c++)
 	g1 = c;
 
-      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
 #pragma acc loop /* { dg-line l_loop_c[incr c_loop_c] } */
-      /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_c$c_loop_c } */
+      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_c$c_loop_c } */
+      /* { dg-note {variable 'c' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_c$c_loop_c } */
       /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_c$c_loop_c } */
       for (c = 0; c <= N; c++)
 	g2 += c;
+	/* { dg-note {variable 'g2\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
 
-    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+      /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
       f1 = 1;
-      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+      /* { dg-note {variable 'f1\.1' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
 #pragma acc loop /* { dg-line l_loop_c[incr c_loop_c] } */
-      /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_c$c_loop_c } */
+      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_c$c_loop_c } */
+      /* { dg-note {variable 'c' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_c$c_loop_c } */
       /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_c$c_loop_c } */
       for (c = 20; c > 0; --c)
 	f1 *= c;
-
-      /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+	/* { dg-note {variable 'f1\.2' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+
+      {
+	/* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+	unsigned long long f2 = 1;
+	/* { dg-note {OpenACC 'kernels' decomposition: variable 'f2' declared in block requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+	   { dg-note {variable 'f2' made addressable} {} { target *-*-* } l_compute$c_compute }
+	   { dg-note {variable 'f2' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
+#pragma acc loop independent reduction(*: f2) /* { dg-line l_loop_c[incr c_loop_c] } */
+	/* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_c$c_loop_c } */
+	/* { dg-note {variable 'c' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_c$c_loop_c } */
+	/* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { target *-*-* } l_loop_c$c_loop_c } */
+	for (c = 20; c > 0; --c)
+	  f2 *= c;
+
+	{
+	  /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+	  if (f2 != f1)
+	    /* { dg-note {variable 'f1\.3' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target { ! __OPTIMIZE__ } } l_compute$c_compute } */
+	    __builtin_abort ();
+
+	  /* As this is still in the preceding 'parloops' part:
+	     { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+	  unsigned long long f3 = f2;
+	  /* { dg-note {OpenACC 'kernels' decomposition: variable 'f3' declared in block requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+	     { dg-note {variable 'f3' made addressable} {} { target *-*-* } l_compute$c_compute }
+	     { dg-note {variable 'f3' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
+#pragma acc loop seq /* { dg-line l_loop_c[incr c_loop_c] } */
+	  /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_c$c_loop_c } */
+	  /* { dg-note {variable 'c' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_c$c_loop_c } */
+	  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_c$c_loop_c } */
+	  for (c = 20; c > 0; --c)
+	    f3 /= c;
+
+	  /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+	  if (f3 != 1)
+	    __builtin_abort ();
+	}
+
+	/* As this is still in the preceding 'parloops' part:
+	   { dg-bogus {note: beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+	if (f2 != f1)
+	  /* { dg-note {variable 'f1\.4' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target { ! __OPTIMIZE__ } } l_compute$c_compute } */
+	  __builtin_abort ();
+      }
+
+      /* As this is still in the preceding 'parloops' part:
+	 { dg-bogus {note: beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
       if (c != 234)
 	__builtin_abort ();
       /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
index 43bfaf331ab..f4b09fded2d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c
@@ -46,8 +46,7 @@ main (int argc, char* argv[])
     for (i = 0; i < 32; i++)
       {
 	#pragma acc loop worker(num:32) /* { dg-line l_loop_j[incr c_loop_j] } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'x' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
@@ -62,8 +61,7 @@ main (int argc, char* argv[])
 	  }
 
 	#pragma acc loop worker(num:32) /* { dg-line l_loop_j[incr c_loop_j] } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'x' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
index c40c2ab33c5..52a7e1af24e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c
@@ -46,8 +46,7 @@ main (int argc, char* argv[])
     for (i = 0; i < 32; i++)
       {
         #pragma acc loop worker(num:32) /* { dg-line l_loop_j[incr c_loop_j] } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'x' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
index bd04dcc7b02..0cbbef77d45 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c
@@ -51,8 +51,7 @@ main (int argc, char* argv[])
     for (i = 0; i < 32; i++)
       {
         #pragma acc loop worker(num:32) /* { dg-line l_loop_j[incr c_loop_j] } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'pt' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
index 4303ab848ea..a908ee580f6 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c
@@ -51,8 +51,7 @@ main (int argc, char* argv[])
     for (i = 0; i < 32; i++)
       {
         #pragma acc loop worker(num:32) /* { dg-line l_loop_j[incr c_loop_j] } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'pt' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'ptp' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
index 8d0e846ce23..713f2d9e38c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c
@@ -46,8 +46,7 @@ main (int argc, char* argv[])
     for (i = 0; i < 32; i++)
       {
         #pragma acc loop worker(num:32) /* { dg-line l_loop_j[incr c_loop_j] } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'pt' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
index 5a70bb880e5..511a31dd628 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c
@@ -45,8 +45,7 @@ main (int argc, char* argv[])
     for (i = 0; i < 32; i++)
       {
         #pragma acc loop worker(num:32) /* { dg-line l_loop_j[incr c_loop_j] } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
 	  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
index f5bccabbe6f..0f14e5f434f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c
@@ -45,8 +45,7 @@ main (int argc, char* argv[])
     for (i = 0; i < 32; i++)
       {
         #pragma acc loop worker(num:32) /* { dg-line l_loop_j[incr c_loop_j] } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
 	  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
index 40baae34562..c15797124c3 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c
@@ -47,8 +47,7 @@ main (int argc, char* argv[])
       {
         #pragma acc loop worker(num:32) private(x) /* { dg-line l_loop_j[incr c_loop_j] } */
 	/* { dg-note {variable 'x' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
 	  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
index c8b089c0f59..5ae73ff286e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c
@@ -47,8 +47,7 @@ main (int argc, char* argv[])
       {
         #pragma acc loop worker(num:32) private(x) /* { dg-line l_loop_j[incr c_loop_j] } */
 	/* { dg-note {variable 'x' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
 	  {
@@ -63,8 +62,7 @@ main (int argc, char* argv[])
 
 	#pragma acc loop worker(num:32) private(x) /* { dg-line l_loop_j[incr c_loop_j] } */
 	/* { dg-note {variable 'x' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
 	  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
index c1819d27179..e7babe4da04 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c
@@ -47,8 +47,7 @@ main (int argc, char* argv[])
       {
         #pragma acc loop worker(num:32) private(x) /* { dg-line l_loop_j[incr c_loop_j] } */
 	/* { dg-note {variable 'x' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
 	  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
index 90955aa8f6e..bb8fb2e472c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c
@@ -48,8 +48,7 @@ main (int argc, char* argv[])
       {
         #pragma acc loop worker(num:32) private(x) /* { dg-line l_loop_j[incr c_loop_j] } */
 	/* { dg-note {variable 'x' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_j$c_loop_j } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'p' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
index f093cfe630b..e4eec7b4247 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c
@@ -53,8 +53,7 @@ main (int argc, char* argv[])
       {
         #pragma acc loop worker(num:32) private(pt) /* { dg-line l_loop_j[incr c_loop_j] } */
 	/* { dg-note {variable 'pt' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
 	  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
index 906119caf24..b52595ac702 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c
@@ -51,8 +51,7 @@ main (int argc, char* argv[])
         /* But here, it is made private per-worker.  */
         #pragma acc loop worker(num:32) private(pt) /* { dg-line l_loop_j[incr c_loop_j] } */
 	/* { dg-note {variable 'pt' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
-	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target c } l_loop_j$c_loop_j }
-	   { dg-note {variable 'j' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target c++ } l_loop_j$c_loop_j } */
+	/* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	/* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
 	for (j = 0; j < 32; j++)
 	  {
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/optional-private.f90 b/libgomp/testsuite/libgomp.oacc-fortran/optional-private.f90
index 4e67809f769..df693628c96 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/optional-private.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/optional-private.f90
@@ -44,7 +44,7 @@ contains
     ! { dg-warning "region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } .-2 }
     !$acc loop gang private(x)
     ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
-    ! { dg-note {variable 'x' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } .-2 }
+    ! { dg-note {variable 'x' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-2 }
     do i = 1, 32
        x = i * 2;
        arr(i) = arr(i) + x
@@ -72,7 +72,7 @@ contains
     ! { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } .-1 }
     !$acc loop gang private(pt)
     ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
-    ! { dg-note {variable 'pt' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } .-2 }
+    ! { dg-note {variable 'pt' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-2 }
     do i = 0, 31
        pt%x = i
        pt%y = i * 2
@@ -111,7 +111,7 @@ contains
        do j = 0, 31
           !$acc loop vector private(pt)
           ! { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
-          ! { dg-note {variable 'pt' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "TODO" { target *-*-* } .-2 }
+          ! { dg-note {variable 'pt' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-2 }
           do k = 0, 31
              pt(1) = ieor(i, j * 3)
              pt(2) = ior(i, j * 5)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
index 906c93010cf..b027d14e7f5 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
@@ -78,7 +78,7 @@ contains
     !$acc loop collapse(2) gang private(t1) ! { dg-line l_loop[incr c_loop] }
     ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 't1' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 't1' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     do i=0,255
       do j=1,256
         t1 = (i * 256 + j) * 97
@@ -103,7 +103,7 @@ contains
     do i=0,255
       !$acc loop worker private(t1) ! { dg-line l_loop[incr c_loop] }
       ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-      ! { dg-note {variable 't1' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop }
+      ! { dg-note {variable 't1' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
       do j=1,256
         t1 = (i * 256 + j) * 99
         res(i * 256 + j) = t1
@@ -127,7 +127,7 @@ contains
     do i=0,255
       !$acc loop vector private(t1) ! { dg-line l_loop[incr c_loop] }
       ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-      ! { dg-note {variable 't1' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop }
+      ! { dg-note {variable 't1' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
       do j=1,256
         t1 = (i * 256 + j) * 101
         res(i * 256 + j) = t1
@@ -149,7 +149,7 @@ contains
     !$acc loop collapse(2) gang worker vector private(t1) ! { dg-line l_loop[incr c_loop] }
     ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 't1' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 't1' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     do i=0,255
       do j=1,256
         t1 = (i * 256 + j) * 103
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
index 6bd17148911..1d91e115d9f 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
@@ -59,7 +59,9 @@ contains
     !$acc parallel copy(array)
     !$acc loop gang private(array) ! { dg-line l_loop[incr c_loop] }
     ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'array' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'array' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'array' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'array' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || { openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
     ! { dg-message {sorry, unimplemented: target cannot support alloca} PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
     do i = 1, 10
       array(i) = i
@@ -87,7 +89,7 @@ contains
     ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'array\.[0-9]+' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'array\.[0-9]+' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || openacc_nvidia_accel_selected } } } l_loop$c_loop }
+    ! { dg-note {variable 'array\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || { openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
     ! { dg-message {sorry, unimplemented: target cannot support alloca} PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
     do i = 1, 10
       array(i) = 9*i
@@ -110,10 +112,12 @@ contains
     !$acc parallel copy(str)
     !$acc loop gang private(str) ! { dg-line l_loop[incr c_loop] }
     ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'str' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'str' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'str' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'str' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || { openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
     ! { dg-note {variable 'char\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'char\.[0-9]+' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'char\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || openacc_nvidia_accel_selected } } } l_loop$c_loop }
+    ! { dg-note {variable 'char\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || { openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
     ! { dg-message {sorry, unimplemented: target cannot support alloca} PR65181 { target openacc_nvidia_accel_selected } l_loop$c_loop }
     do i = 1, 10
       str(i:i) = achar(ichar('A') + i)
@@ -153,10 +157,12 @@ contains
     !$acc parallel copy(scalar)
     !$acc loop gang private(scalar) ! { dg-line l_loop[incr c_loop] }
     ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'scalar' in 'private' clause potentially has improper OpenACC privatization level: 'parm_decl'} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'scalar' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'scalar' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
+    ! { dg-note {variable 'scalar' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || { openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
     ! { dg-note {variable 'char\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop }
     ! { dg-note {variable 'char\.[0-9]+' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } l_loop$c_loop }
-    ! { dg-note {variable 'char\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || openacc_nvidia_accel_selected } } } l_loop$c_loop }
+    ! { dg-note {variable 'char\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} "" { target { ! { openacc_host_selected || { openacc_nvidia_accel_selected && __OPTIMIZE__ } } } } l_loop$c_loop }
     do i = 1, 15
       scalar(i:i) = achar(ichar('A') + i)
     end do
-- 
2.34.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-03-10 11:29 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27 16:21 [gomp4] add support for gang local storage allocation in shared memory Cesar Philippidis
2018-08-13 16:22 ` [PATCH, OpenACC] Add " Julian Brown
2018-08-13 18:42   ` Cesar Philippidis
2018-08-13 19:06     ` Cesar Philippidis
2018-08-15 16:46       ` Julian Brown
2018-08-15 19:57         ` Bernhard Reutner-Fischer
2018-08-16 15:47           ` Julian Brown
2018-08-17 16:39             ` Bernhard Reutner-Fischer
2018-12-11 15:08               ` Julian Brown
2019-06-03 16:03                 ` Julian Brown
2019-06-03 16:23                   ` Jakub Jelinek
2019-06-07 14:08                     ` Julian Brown
2019-06-12 10:23                       ` Jakub Jelinek
2019-06-12 10:32                         ` Tom de Vries
2019-06-12 11:57                       ` Thomas Schwinge
2019-06-12 19:43                         ` Julian Brown
2019-11-06 22:59                           ` Julian Brown
2021-05-21 19:05                       ` Thomas Schwinge
2022-02-14 15:56                   ` Thomas Schwinge
2022-02-15 13:40                     ` Julian Brown
2022-03-10 11:28                       ` [OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, PR102330, PR104774] Thomas Schwinge
2022-03-10 11:13                     ` Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330] Thomas Schwinge
2022-03-10 11:18                       ` Add 'c-c++-common/goacc/kernels-decompose-pr104774-1.c' [PR104774] Thomas Schwinge
2018-10-05 14:07             ` [PATCH, OpenACC] Add support for gang local storage allocation in shared memory Tom de Vries
2018-08-13 20:42     ` Julian Brown
2021-05-19 12:10       ` Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c' (was: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory) Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).