public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
@ 2015-06-15 12:23 Ilya Verbin
  2015-06-15 14:30 ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Verbin @ 2015-06-15 12:23 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge, gcc-patches; +Cc: Kirill Yukhin

Hi,

This patch introduces new versions of GOMP_target{,_data,_update} for OpenMP 4.1
with unsigned short for map kinds, but without new async arguments yet.

make check-target-libgomp and bootstrap passed, ok for gomp-4_1-branch?


gcc/
	* builtin-types.def (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR): Remove.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR): New.
	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR): Remove.
	* omp-builtins.def (BUILT_IN_GOMP_TARGET): Replace GOMP_target with
	GOMP_target1.
	(BUILT_IN_GOMP_TARGET_DATA): Replace GOMP_target_data with
	GOMP_target_data1.
	(BUILT_IN_GOMP_TARGET_UPDATE): Replace GOMP_target_update with
	GOMP_target_update1.
	(BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA): New.
	* omp-low.c (expand_omp_target): Use
	BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA for GF_OMP_TARGET_KIND_ENTER_DATA
	and GF_OMP_TARGET_KIND_EXIT_DATA.
	Do not pass obsolete pointer to new builtins.
	(lower_omp_target): Always use unsigned short for map kinds.
gcc/fortran/
	* types.def (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR): Remove.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR): New.
	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR): Remove.
libgomp/
	* libgomp.map (GOMP_4.1): Add GOMP_target1, GOMP_target_data1,
	GOMP_target_update1, GOMP_target_enter_exit_data.
	* libgomp_g.h: Declare GOMP_target1, GOMP_target_data1,
	GOMP_target_update1, GOMP_target_enter_exit_data.
	* target.c (resolve_device): Call gomp_init_device here instead of
	GOMP_target*.
	(get_kind): Rename is_openacc to short_mapkind.
	(gomp_map_vars): Likewise.
	(gomp_unmap_vars): Likewise.
	(gomp_update): Likewise.
	(gomp_target_fallback): New static function.
	(gomp_get_target_fn_addr): New static function.
	(GOMP_target): Move host fallback and fn lookup to the new functions.
	(GOMP_target1): New function.
	(gomp_target_data_fallback): New static function.
	(GOMP_target_data): Move host fallback to the new function.
	(GOMP_target_data1): New function.
	(GOMP_target_update): Do not call gomp_init_device.
	(GOMP_target_update1): New function.
	(GOMP_target_enter_exit_data): New function.


diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 492ca63..3c4b9e3 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -524,8 +524,9 @@ DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_VPTR_PTR_I16_BOOL_INT_INT,
 		     BT_INT)
 DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_SIZE_VPTR_PTR_PTR_INT_INT, BT_BOOL, BT_SIZE,
 		     BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT)
-DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR,
-		     BT_VOID, BT_INT, BT_PTR, BT_SIZE, BT_PTR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR,
+		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+		     BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_7 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT,
@@ -534,9 +535,6 @@ DEF_FUNCTION_TYPE_7 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
 		     BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		     BT_ULONGLONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG)
-DEF_FUNCTION_TYPE_7 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR,
-		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE,
-		     BT_PTR, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_8 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT,
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index c0d3989..18f81e6 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -189,8 +189,9 @@ DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_VPTR_PTR_I16_BOOL_INT_INT,
 		     BT_INT)
 DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_SIZE_VPTR_PTR_PTR_INT_INT, BT_BOOL, BT_SIZE,
 		     BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT)
-DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR,
-		     BT_VOID, BT_INT, BT_PTR, BT_SIZE, BT_PTR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR,
+		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+		     BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_7 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
                      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT,
@@ -199,9 +200,6 @@ DEF_FUNCTION_TYPE_7 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
 		     BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		     BT_ULONGLONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG)
-DEF_FUNCTION_TYPE_7 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR,
-		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE,
-		     BT_PTR, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_8 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT,
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 749def4..b8623af 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -262,14 +262,16 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, "GOMP_single_copy_start",
 		  BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, "GOMP_single_copy_end",
 		  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
-DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target",
-		  BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR,
-		  ATTR_NOTHROW_LIST)
-DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data",
-		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target1",
+		  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data1",
+		  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_END_DATA, "GOMP_target_end_data",
 		  BT_FN_VOID, ATTR_NOTHROW_LIST)
-DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
-		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update1",
+		  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA,
+		  "GOMP_target_enter_exit_data",
+		  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
 		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e4f5566..3f39e1b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10226,8 +10226,7 @@ expand_omp_target (struct omp_region *region)
       break;
     case GF_OMP_TARGET_KIND_ENTER_DATA:
     case GF_OMP_TARGET_KIND_EXIT_DATA:
-      /* FIXME */
-      start_ix = BUILT_IN_GOMP_TARGET_UPDATE;
+      start_ix = BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA;
       break;
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
@@ -10264,7 +10263,8 @@ expand_omp_target (struct omp_region *region)
 	 defined/used for the OpenMP target ones.  */
       gcc_checking_assert (start_ix == BUILT_IN_GOMP_TARGET
 			   || start_ix == BUILT_IN_GOMP_TARGET_DATA
-			   || start_ix == BUILT_IN_GOMP_TARGET_UPDATE);
+			   || start_ix == BUILT_IN_GOMP_TARGET_UPDATE
+			   || start_ix == BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA);
 
       device = OMP_CLAUSE_DEVICE_ID (c);
       clause_loc = OMP_CLAUSE_LOCATION (c);
@@ -10351,23 +10351,6 @@ expand_omp_target (struct omp_region *region)
   args.quick_push (device);
   if (offloaded)
     args.quick_push (build_fold_addr_expr (child_fn));
-  switch (start_ix)
-    {
-    case BUILT_IN_GOMP_TARGET:
-    case BUILT_IN_GOMP_TARGET_DATA:
-    case BUILT_IN_GOMP_TARGET_UPDATE:
-      /* This const void * is part of the current ABI, but we're not actually
-	 using it.  */
-      args.quick_push (build_zero_cst (ptr_type_node));
-      break;
-    case BUILT_IN_GOACC_DATA_START:
-    case BUILT_IN_GOACC_ENTER_EXIT_DATA:
-    case BUILT_IN_GOACC_PARALLEL:
-    case BUILT_IN_GOACC_UPDATE:
-      break;
-    default:
-      gcc_unreachable ();
-    }
   args.quick_push (t1);
   args.quick_push (t2);
   args.quick_push (t3);
@@ -10378,6 +10361,7 @@ expand_omp_target (struct omp_region *region)
     case BUILT_IN_GOMP_TARGET:
     case BUILT_IN_GOMP_TARGET_DATA:
     case BUILT_IN_GOMP_TARGET_UPDATE:
+    case BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA:
       break;
     case BUILT_IN_GOACC_PARALLEL:
       {
@@ -12631,18 +12615,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       DECL_NAMELESS (TREE_VEC_ELT (t, 1)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 1)) = 1;
       TREE_STATIC (TREE_VEC_ELT (t, 1)) = 1;
-      tree tkind_type;
-      int talign_shift;
-      if (is_gimple_omp_oacc (stmt))
-	{
-	  tkind_type = short_unsigned_type_node;
-	  talign_shift = 8;
-	}
-      else
-	{
-	  tkind_type = unsigned_char_type_node;
-	  talign_shift = 3;
-	}
+      tree tkind_type = short_unsigned_type_node;
+      int talign_shift = 8;
       TREE_VEC_ELT (t, 2)
 	= create_tmp_var (build_array_type_nelts (tkind_type, map_cnt),
 			  ".omp_data_kinds");
@@ -12782,9 +12756,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	      default:
 		gcc_unreachable ();
 	      }
-	    /* FIXME: Temporary hack.  */
-	    if (talign_shift == 3)
-	      tkind &= ~GOMP_MAP_FLAG_FORCE;
 	    gcc_checking_assert (tkind
 				 < (HOST_WIDE_INT_C (1U) << talign_shift));
 	    talign = ceil_log2 (talign);
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 36c0bb5..d15b9ba 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -242,6 +242,10 @@ GOMP_4.0.1 {
 
 GOMP_4.1 {
   global:
+	GOMP_target1;
+	GOMP_target_data1;
+	GOMP_target_update1;
+	GOMP_target_enter_exit_data;
 	GOMP_taskloop;
 	GOMP_taskloop_ull;
 } GOMP_4.0.1;
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 5e88d45..2536ee4 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -208,11 +208,19 @@ extern void GOMP_single_copy_end (void *);
 
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
+extern void GOMP_target1 (int, void (*) (void *), size_t, void **, size_t *,
+			  unsigned short *);
 extern void GOMP_target_data (int, const void *,
 			      size_t, void **, size_t *, unsigned char *);
+extern void GOMP_target_data1 (int, size_t, void **, size_t *,
+			       unsigned short *);
 extern void GOMP_target_end_data (void);
 extern void GOMP_target_update (int, const void *,
 				size_t, void **, size_t *, unsigned char *);
+extern void GOMP_target_update1 (int, size_t, void **, size_t *,
+				 unsigned short *);
+extern void GOMP_target_enter_exit_data (int, size_t, void **, size_t *,
+					 unsigned short *);
 extern void GOMP_teams (unsigned int, unsigned int);
 
 /* oacc-parallel.c */
diff --git a/libgomp/target.c b/libgomp/target.c
index d8da783..bedc95a 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -132,6 +132,11 @@ resolve_device (int device_id)
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
+  gomp_mutex_lock (&devices[device_id].lock);
+  if (!devices[device_id].is_initialized)
+    gomp_init_device (&devices[device_id]);
+  gomp_mutex_unlock (&devices[device_id].lock);
+
   return &devices[device_id];
 }
 
@@ -157,20 +162,20 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep, splay_tree_key oldn,
 }
 
 static int
-get_kind (bool is_openacc, void *kinds, int idx)
+get_kind (bool short_mapkind, void *kinds, int idx)
 {
-  return is_openacc ? ((unsigned short *) kinds)[idx]
-		    : ((unsigned char *) kinds)[idx];
+  return short_mapkind ? ((unsigned short *) kinds)[idx]
+		       : ((unsigned char *) kinds)[idx];
 }
 
 attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
-	       bool is_openacc, bool is_target)
+	       bool short_mapkind, bool is_target)
 {
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
-  const int rshift = is_openacc ? 8 : 3;
-  const int typemask = is_openacc ? 0xff : 0x7;
+  const int rshift = short_mapkind ? 8 : 3;
+  const int typemask = short_mapkind ? 0xff : 0x7;
   struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
@@ -195,7 +200,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 
   for (i = 0; i < mapnum; i++)
     {
-      int kind = get_kind (is_openacc, kinds, i);
+      int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
 	  tgt->list[i] = NULL;
@@ -226,7 +231,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	    {
 	      size_t j;
 	      for (j = i + 1; j < mapnum; j++)
-		if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+		if (!GOMP_MAP_POINTER_P (get_kind (short_mapkind, kinds, j)
 					 & typemask))
 		  break;
 		else if ((uintptr_t) hostaddrs[j] < cur_node.host_start
@@ -285,7 +290,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       for (i = 0; i < mapnum; i++)
 	if (tgt->list[i] == NULL)
 	  {
-	    int kind = get_kind (is_openacc, kinds, i);
+	    int kind = get_kind (short_mapkind, kinds, i);
 	    if (hostaddrs[i] == NULL)
 	      continue;
 	    splay_tree_key k = &array->key;
@@ -394,7 +399,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    k->host_end - k->host_start);
 
 		    for (j = i + 1; j < mapnum; j++)
-		      if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+		      if (!GOMP_MAP_POINTER_P (get_kind (short_mapkind, kinds,
+							 j)
 					       & typemask))
 			break;
 		      else if ((uintptr_t) hostaddrs[j] < k->host_start
@@ -613,11 +619,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 
 static void
 gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
-	     size_t *sizes, void *kinds, bool is_openacc)
+	     size_t *sizes, void *kinds, bool short_mapkind)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
-  const int typemask = is_openacc ? 0xff : 0x7;
+  const int typemask = short_mapkind ? 0xff : 0x7;
 
   if (!devicep)
     return;
@@ -634,7 +640,7 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
 	splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &cur_node);
 	if (n)
 	  {
-	    int kind = get_kind (is_openacc, kinds, i);
+	    int kind = get_kind (short_mapkind, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
 	      {
@@ -931,6 +937,47 @@ gomp_fini_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = false;
 }
 
+/* Host fallback for GOMP_target[1] routines.  */
+
+static void
+gomp_target_fallback (void (*fn) (void *), void **hostaddrs)
+{
+  struct gomp_thread old_thr, *thr = gomp_thread ();
+  old_thr = *thr;
+  memset (thr, '\0', sizeof (*thr));
+  if (gomp_places_list)
+    {
+      thr->place = old_thr.place;
+      thr->ts.place_partition_len = gomp_places_list_len;
+    }
+  fn (hostaddrs);
+  gomp_free_thread (thr);
+  *thr = old_thr;
+}
+
+/* Helper function of GOMP_target[1] routines.  */
+
+static void *
+gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
+			 void (*host_fn) (void *))
+{
+  if (devicep->capabilities & GOMP_OFFLOAD_CAP_NATIVE_EXEC)
+    return (void *) host_fn;
+  else
+    {
+      gomp_mutex_lock (&devicep->lock);
+      struct splay_tree_key_s k;
+      k.host_start = (uintptr_t) host_fn;
+      k.host_end = k.host_start + 1;
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
+      gomp_mutex_unlock (&devicep->lock);
+      if (tgt_fn == NULL)
+	gomp_fatal ("Target function wasn't mapped");
+
+      return (void *) tgt_fn->tgt_offset;
+    }
+}
+
 /* Called when encountering a target directive.  If DEVICE
    is GOMP_DEVICE_ICV, it means use device-var ICV.  If it is
    GOMP_DEVICE_HOST_FALLBACK (or any value
@@ -950,50 +997,41 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
 
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return gomp_target_fallback (fn, hostaddrs);
+
+  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+
+  struct target_mem_desc *tgt_vars
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+		     true);
+  struct gomp_thread old_thr, *thr = gomp_thread ();
+  old_thr = *thr;
+  memset (thr, '\0', sizeof (*thr));
+  if (gomp_places_list)
     {
-      /* Host fallback.  */
-      struct gomp_thread old_thr, *thr = gomp_thread ();
-      old_thr = *thr;
-      memset (thr, '\0', sizeof (*thr));
-      if (gomp_places_list)
-	{
-	  thr->place = old_thr.place;
-	  thr->ts.place_partition_len = gomp_places_list_len;
-	}
-      fn (hostaddrs);
-      gomp_free_thread (thr);
-      *thr = old_thr;
-      return;
+      thr->place = old_thr.place;
+      thr->ts.place_partition_len = gomp_places_list_len;
     }
+  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
+  gomp_free_thread (thr);
+  *thr = old_thr;
+  gomp_unmap_vars (tgt_vars, true);
+}
 
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
-
-  void *fn_addr;
+void
+GOMP_target1 (int device, void (*fn) (void *), size_t mapnum, void **hostaddrs,
+	      size_t *sizes, unsigned short *kinds)
+{
+  struct gomp_device_descr *devicep = resolve_device (device);
 
-  if (devicep->capabilities & GOMP_OFFLOAD_CAP_NATIVE_EXEC)
-    fn_addr = (void *) fn;
-  else
-    {
-      gomp_mutex_lock (&devicep->lock);
-      struct splay_tree_key_s k;
-      k.host_start = (uintptr_t) fn;
-      k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
-      if (tgt_fn == NULL)
-	{
-	  gomp_mutex_unlock (&devicep->lock);
-	  gomp_fatal ("Target function wasn't mapped");
-	}
-      gomp_mutex_unlock (&devicep->lock);
+  if (devicep == NULL
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return gomp_target_fallback (fn, hostaddrs);
 
-      fn_addr = (void *) tgt_fn->tgt_offset;
-    }
+  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
 
   struct target_mem_desc *tgt_vars
-    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
 		     true);
   struct gomp_thread old_thr, *thr = gomp_thread ();
   old_thr = *thr;
@@ -1009,6 +1047,25 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
   gomp_unmap_vars (tgt_vars, true);
 }
 
+/* Host fallback for GOMP_target_data[1] routines.  */
+
+static void
+gomp_target_data_fallback (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  if (icv->target_data)
+    {
+      /* Even when doing a host fallback, if there are any active
+         #pragma omp target data constructs, need to remember the
+         new #pragma omp target data, otherwise GOMP_target_end_data
+         would get out of sync.  */
+      struct target_mem_desc *tgt
+	= gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, false, false);
+      tgt->prev = icv->target_data;
+      icv->target_data = tgt;
+    }
+}
+
 void
 GOMP_target_data (int device, const void *unused, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned char *kinds)
@@ -1017,27 +1074,7 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
 
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
-    {
-      /* Host fallback.  */
-      struct gomp_task_icv *icv = gomp_icv (false);
-      if (icv->target_data)
-	{
-	  /* Even when doing a host fallback, if there are any active
-	     #pragma omp target data constructs, need to remember the
-	     new #pragma omp target data, otherwise GOMP_target_end_data
-	     would get out of sync.  */
-	  struct target_mem_desc *tgt
-	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, false, false);
-	  tgt->prev = icv->target_data;
-	  icv->target_data = tgt;
-	}
-      return;
-    }
-
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
+    return gomp_target_data_fallback ();
 
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
@@ -1048,6 +1085,24 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
 }
 
 void
+GOMP_target_data1 (int device, size_t mapnum, void **hostaddrs, size_t *sizes,
+		   unsigned short *kinds)
+{
+  struct gomp_device_descr *devicep = resolve_device (device);
+
+  if (devicep == NULL
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return gomp_target_data_fallback ();
+
+  struct target_mem_desc *tgt
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		     false);
+  struct gomp_task_icv *icv = gomp_icv (true);
+  tgt->prev = icv->target_data;
+  icv->target_data = tgt;
+}
+
+void
 GOMP_target_end_data (void)
 {
   struct gomp_task_icv *icv = gomp_icv (false);
@@ -1069,15 +1124,71 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
     return;
 
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
-
   gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
+GOMP_target_update1 (int device, size_t mapnum, void **hostaddrs, size_t *sizes,
+		     unsigned short *kinds)
+{
+  struct gomp_device_descr *devicep = resolve_device (device);
+
+  if (devicep == NULL
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return;
+
+  gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, true);
+}
+
+void
+GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
+			     size_t *sizes, unsigned short *kinds)
+{
+  struct gomp_device_descr *devicep = resolve_device (device);
+
+  if (devicep == NULL
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return;
+
+  /* Determine if this is an "omp target enter data".  */
+  const int typemask = 0xff;
+  bool is_enter_data = false;
+  size_t i;
+  for (i = 0; i < mapnum; i++)
+    {
+      unsigned char kind = kinds[i] & typemask;
+
+      if (kind == GOMP_MAP_POINTER || kind == GOMP_MAP_TO_PSET)
+	continue;
+
+      if (kind == GOMP_MAP_ALLOC
+	  || kind == GOMP_MAP_TO
+	  || kind == GOMP_MAP_ALWAYS_TO)
+	{
+	  is_enter_data = true;
+	  break;
+	}
+
+      if (kind == GOMP_MAP_FROM
+	  || kind == GOMP_MAP_ALWAYS_FROM
+	  || kind == GOMP_MAP_DELETE
+	  || kind == GOMP_MAP_RELEASE)
+	break;
+
+      gomp_fatal ("GOMP_target_enter_exit_data unhandled kind 0x%.2x", kind);
+    }
+
+  if (is_enter_data)
+    {
+      /* TODO  */
+    }
+  else
+    {
+      /* TODO  */
+    }
+}
+
+void
 GOMP_teams (unsigned int num_teams, unsigned int thread_limit)
 {
   if (thread_limit)


  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-15 12:23 [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data Ilya Verbin
@ 2015-06-15 14:30 ` Jakub Jelinek
  2015-06-15 16:22   ` Ilya Verbin
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2015-06-15 14:30 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Jun 15, 2015 at 03:20:37PM +0300, Ilya Verbin wrote:
> This patch introduces new versions of GOMP_target{,_data,_update} for OpenMP 4.1
> with unsigned short for map kinds, but without new async arguments yet.

I think I'd prefer (for now) to suffix the functions with _41 instead of 1
(and we'll see if we can come up with better names when async support is
added).  Do we need to change GOMP_target_update though (at least right
now)?  I mean, the construct only allows to and from clauses, not the map
clause, and those don't really have an always modifier, nor release/delete
semantics etc., so at least for now I think using the current
GOMP_target_update should be ok.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-15 14:30 ` Jakub Jelinek
@ 2015-06-15 16:22   ` Ilya Verbin
  2015-06-15 16:30     ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Verbin @ 2015-06-15 16:22 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Jun 15, 2015 at 15:06:09 +0200, Jakub Jelinek wrote:
> On Mon, Jun 15, 2015 at 03:20:37PM +0300, Ilya Verbin wrote:
> > This patch introduces new versions of GOMP_target{,_data,_update} for OpenMP 4.1
> > with unsigned short for map kinds, but without new async arguments yet.
> 
> I think I'd prefer (for now) to suffix the functions with _41 instead of 1
> (and we'll see if we can come up with better names when async support is
> added).

OK.

> Do we need to change GOMP_target_update though (at least right
> now)?  I mean, the construct only allows to and from clauses, not the map
> clause, and those don't really have an always modifier, nor release/delete
> semantics etc., so at least for now I think using the current
> GOMP_target_update should be ok.

I thought that it wouldn't look good, since without GOMP_target_update_41 we
will need to keep this obsolete parts:

-  switch (start_ix)
-    {
-    case BUILT_IN_GOMP_TARGET_UPDATE:
-      /* This const void * is part of the current ABI, but we're not actually
-	 using it.  */
-      args.quick_push (build_zero_cst (ptr_type_node));
-      break;
-    case BUILT_IN_GOMP_TARGET:
-    case BUILT_IN_GOMP_TARGET_DATA:
-    case BUILT_IN_GOACC_DATA_START:
-    case BUILT_IN_GOACC_ENTER_EXIT_DATA:
-    case BUILT_IN_GOACC_PARALLEL:
-    case BUILT_IN_GOACC_UPDATE:
-      break;
-    default:
-      gcc_unreachable ();
-    }

and

-      tree tkind_type;
-      int talign_shift;
-      if (is_gimple_omp_oacc (stmt))
-	{
-	  tkind_type = short_unsigned_type_node;
-	  talign_shift = 8;
-	}
-      else
-	{
-	  tkind_type = unsigned_char_type_node;
-	  talign_shift = 3;
-	}
+      tree tkind_type = short_unsigned_type_node;
+      int talign_shift = 8;

  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-15 16:22   ` Ilya Verbin
@ 2015-06-15 16:30     ` Jakub Jelinek
  2015-06-15 19:54       ` Ilya Verbin
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2015-06-15 16:30 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Jun 15, 2015 at 07:18:27PM +0300, Ilya Verbin wrote:
> On Mon, Jun 15, 2015 at 15:06:09 +0200, Jakub Jelinek wrote:
> > On Mon, Jun 15, 2015 at 03:20:37PM +0300, Ilya Verbin wrote:
> > > This patch introduces new versions of GOMP_target{,_data,_update} for OpenMP 4.1
> > > with unsigned short for map kinds, but without new async arguments yet.
> > 
> > I think I'd prefer (for now) to suffix the functions with _41 instead of 1
> > (and we'll see if we can come up with better names when async support is
> > added).
> 
> OK.

Thanks.

> > Do we need to change GOMP_target_update though (at least right
> > now)?  I mean, the construct only allows to and from clauses, not the map
> > clause, and those don't really have an always modifier, nor release/delete
> > semantics etc., so at least for now I think using the current
> > GOMP_target_update should be ok.
> 
> I thought that it wouldn't look good, since without GOMP_target_update_41 we
> will need to keep this obsolete parts:

I'd prefer to keep it for now, perhaps later on we'll switch to 16-bit kinds
even for that, but better figure out first what to do with the async stuff,
handle the enter/exit data correctly, change the library for OpenMP 4.1 to
do the fully refcounted model.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-15 16:30     ` Jakub Jelinek
@ 2015-06-15 19:54       ` Ilya Verbin
  2015-06-15 19:58         ` Jakub Jelinek
  2015-10-13 14:50         ` Ilya Verbin
  0 siblings, 2 replies; 49+ messages in thread
From: Ilya Verbin @ 2015-06-15 19:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Jun 15, 2015 at 18:25:28 +0200, Jakub Jelinek wrote:
> On Mon, Jun 15, 2015 at 07:18:27PM +0300, Ilya Verbin wrote:
> > On Mon, Jun 15, 2015 at 15:06:09 +0200, Jakub Jelinek wrote:
> > > On Mon, Jun 15, 2015 at 03:20:37PM +0300, Ilya Verbin wrote:
> > > > This patch introduces new versions of GOMP_target{,_data,_update} for OpenMP 4.1
> > > > with unsigned short for map kinds, but without new async arguments yet.
> > > 
> > > I think I'd prefer (for now) to suffix the functions with _41 instead of 1
> > > (and we'll see if we can come up with better names when async support is
> > > added).
> > 
> > OK.
> 
> Thanks.
> 
> > > Do we need to change GOMP_target_update though (at least right
> > > now)?  I mean, the construct only allows to and from clauses, not the map
> > > clause, and those don't really have an always modifier, nor release/delete
> > > semantics etc., so at least for now I think using the current
> > > GOMP_target_update should be ok.
> > 
> > I thought that it wouldn't look good, since without GOMP_target_update_41 we
> > will need to keep this obsolete parts:
> 
> I'd prefer to keep it for now, perhaps later on we'll switch to 16-bit kinds
> even for that, but better figure out first what to do with the async stuff,
> handle the enter/exit data correctly, change the library for OpenMP 4.1 to
> do the fully refcounted model.

Here is the new patch.  OK to commit?


gcc/
	* builtin-types.def (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR): New.
	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR): Remove.
	* omp-builtins.def (BUILT_IN_GOMP_TARGET): Replace GOMP_target with
	GOMP_target_41.
	(BUILT_IN_GOMP_TARGET_DATA): Replace GOMP_target_data with
	GOMP_target_data_41.
	(BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA): New.
	* omp-low.c (expand_omp_target): Use
	BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA for GF_OMP_TARGET_KIND_ENTER_DATA
	and GF_OMP_TARGET_KIND_EXIT_DATA.
	Do not pass obsolete pointer to new builtins.
	(lower_omp_target): Use unsigned short for map kinds, except
	BUILT_IN_GOMP_TARGET_UPDATE.
gcc/fortran/
	* types.def (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR): New.
	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR): Remove.
libgomp/
	* libgomp.map (GOMP_4.1): Add GOMP_target_41, GOMP_target_data_41,
	GOMP_target_enter_exit_data.
	* libgomp_g.h: Declare GOMP_target_41, GOMP_target_data_41,
	GOMP_target_enter_exit_data.
	* target.c (resolve_device): Call gomp_init_device here instead of
	GOMP_target*.
	(get_kind): Rename is_openacc to short_mapkind.
	(gomp_map_vars): Likewise.
	(gomp_unmap_vars): Likewise.
	(gomp_update): Likewise.
	(gomp_target_fallback): New static function.
	(gomp_get_target_fn_addr): New static function.
	(GOMP_target): Move host fallback and fn lookup to the new functions.
	(GOMP_target_41): New function.
	(gomp_target_data_fallback): New static function.
	(GOMP_target_data): Move host fallback to the new function.
	(GOMP_target_data_41): New function.
	(GOMP_target_update): Do not call gomp_init_device.
	(GOMP_target_enter_exit_data): New function.


diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 492ca63..870c957 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -526,6 +526,9 @@ DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_SIZE_VPTR_PTR_PTR_INT_INT, BT_BOOL, BT_SIZE,
 		     BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR, BT_SIZE, BT_PTR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR,
+		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+		     BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_7 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT,
@@ -534,9 +537,6 @@ DEF_FUNCTION_TYPE_7 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
 		     BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		     BT_ULONGLONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG)
-DEF_FUNCTION_TYPE_7 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR,
-		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE,
-		     BT_PTR, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_8 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT,
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index c0d3989..a830235 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -189,6 +189,9 @@ DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_VPTR_PTR_I16_BOOL_INT_INT,
 		     BT_INT)
 DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_SIZE_VPTR_PTR_PTR_INT_INT, BT_BOOL, BT_SIZE,
 		     BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT)
+DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR,
+		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+		     BT_PTR, BT_PTR)
 DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR,
 		     BT_VOID, BT_INT, BT_PTR, BT_SIZE, BT_PTR, BT_PTR, BT_PTR)
 
@@ -199,9 +202,6 @@ DEF_FUNCTION_TYPE_7 (BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
 		     BT_BOOL, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG,
 		     BT_ULONGLONG, BT_ULONGLONG,
 		     BT_PTR_ULONGLONG, BT_PTR_ULONGLONG)
-DEF_FUNCTION_TYPE_7 (BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR,
-		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_PTR, BT_SIZE,
-		     BT_PTR, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_8 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT,
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 749def4..470f038 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -262,14 +262,16 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, "GOMP_single_copy_start",
 		  BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, "GOMP_single_copy_end",
 		  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
-DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target",
-		  BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR,
-		  ATTR_NOTHROW_LIST)
-DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data",
-		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target_41",
+		  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data_41",
+		  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_END_DATA, "GOMP_target_end_data",
 		  BT_FN_VOID, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update",
 		  BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA,
+		  "GOMP_target_enter_exit_data",
+		  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams",
 		  BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e4f5566..3e27f8a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10226,8 +10226,7 @@ expand_omp_target (struct omp_region *region)
       break;
     case GF_OMP_TARGET_KIND_ENTER_DATA:
     case GF_OMP_TARGET_KIND_EXIT_DATA:
-      /* FIXME */
-      start_ix = BUILT_IN_GOMP_TARGET_UPDATE;
+      start_ix = BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA;
       break;
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
@@ -10264,7 +10263,8 @@ expand_omp_target (struct omp_region *region)
 	 defined/used for the OpenMP target ones.  */
       gcc_checking_assert (start_ix == BUILT_IN_GOMP_TARGET
 			   || start_ix == BUILT_IN_GOMP_TARGET_DATA
-			   || start_ix == BUILT_IN_GOMP_TARGET_UPDATE);
+			   || start_ix == BUILT_IN_GOMP_TARGET_UPDATE
+			   || start_ix == BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA);
 
       device = OMP_CLAUSE_DEVICE_ID (c);
       clause_loc = OMP_CLAUSE_LOCATION (c);
@@ -10351,23 +10351,10 @@ expand_omp_target (struct omp_region *region)
   args.quick_push (device);
   if (offloaded)
     args.quick_push (build_fold_addr_expr (child_fn));
-  switch (start_ix)
-    {
-    case BUILT_IN_GOMP_TARGET:
-    case BUILT_IN_GOMP_TARGET_DATA:
-    case BUILT_IN_GOMP_TARGET_UPDATE:
-      /* This const void * is part of the current ABI, but we're not actually
-	 using it.  */
-      args.quick_push (build_zero_cst (ptr_type_node));
-      break;
-    case BUILT_IN_GOACC_DATA_START:
-    case BUILT_IN_GOACC_ENTER_EXIT_DATA:
-    case BUILT_IN_GOACC_PARALLEL:
-    case BUILT_IN_GOACC_UPDATE:
-      break;
-    default:
-      gcc_unreachable ();
-    }
+  /* This const void * is part of the current ABI, but we're not actually using
+     it.  */
+  if (start_ix == BUILT_IN_GOMP_TARGET_UPDATE)
+    args.quick_push (build_zero_cst (ptr_type_node));
   args.quick_push (t1);
   args.quick_push (t2);
   args.quick_push (t3);
@@ -10378,6 +10365,7 @@ expand_omp_target (struct omp_region *region)
     case BUILT_IN_GOMP_TARGET:
     case BUILT_IN_GOMP_TARGET_DATA:
     case BUILT_IN_GOMP_TARGET_UPDATE:
+    case BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA:
       break;
     case BUILT_IN_GOACC_PARALLEL:
       {
@@ -12633,7 +12621,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       TREE_STATIC (TREE_VEC_ELT (t, 1)) = 1;
       tree tkind_type;
       int talign_shift;
-      if (is_gimple_omp_oacc (stmt))
+      if (is_gimple_omp_oacc (stmt)
+	  || gimple_omp_target_kind (stmt) != GF_OMP_TARGET_KIND_UPDATE)
 	{
 	  tkind_type = short_unsigned_type_node;
 	  talign_shift = 8;
@@ -12782,9 +12771,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	      default:
 		gcc_unreachable ();
 	      }
-	    /* FIXME: Temporary hack.  */
-	    if (talign_shift == 3)
-	      tkind &= ~GOMP_MAP_FLAG_FORCE;
 	    gcc_checking_assert (tkind
 				 < (HOST_WIDE_INT_C (1U) << talign_shift));
 	    talign = ceil_log2 (talign);
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 36c0bb5..a77f1e3 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -242,6 +242,9 @@ GOMP_4.0.1 {
 
 GOMP_4.1 {
   global:
+	GOMP_target_41;
+	GOMP_target_data_41;
+	GOMP_target_enter_exit_data;
 	GOMP_taskloop;
 	GOMP_taskloop_ull;
 } GOMP_4.0.1;
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 5e88d45..ef7dc0d 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -208,11 +208,17 @@ extern void GOMP_single_copy_end (void *);
 
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
+extern void GOMP_target_41 (int, void (*) (void *), size_t, void **, size_t *,
+			  unsigned short *);
 extern void GOMP_target_data (int, const void *,
 			      size_t, void **, size_t *, unsigned char *);
+extern void GOMP_target_data_41 (int, size_t, void **, size_t *,
+			       unsigned short *);
 extern void GOMP_target_end_data (void);
 extern void GOMP_target_update (int, const void *,
 				size_t, void **, size_t *, unsigned char *);
+extern void GOMP_target_enter_exit_data (int, size_t, void **, size_t *,
+					 unsigned short *);
 extern void GOMP_teams (unsigned int, unsigned int);
 
 /* oacc-parallel.c */
diff --git a/libgomp/target.c b/libgomp/target.c
index d8da783..218b1a4 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -132,6 +132,11 @@ resolve_device (int device_id)
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
+  gomp_mutex_lock (&devices[device_id].lock);
+  if (!devices[device_id].is_initialized)
+    gomp_init_device (&devices[device_id]);
+  gomp_mutex_unlock (&devices[device_id].lock);
+
   return &devices[device_id];
 }
 
@@ -157,20 +162,20 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep, splay_tree_key oldn,
 }
 
 static int
-get_kind (bool is_openacc, void *kinds, int idx)
+get_kind (bool short_mapkind, void *kinds, int idx)
 {
-  return is_openacc ? ((unsigned short *) kinds)[idx]
-		    : ((unsigned char *) kinds)[idx];
+  return short_mapkind ? ((unsigned short *) kinds)[idx]
+		       : ((unsigned char *) kinds)[idx];
 }
 
 attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
-	       bool is_openacc, bool is_target)
+	       bool short_mapkind, bool is_target)
 {
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
-  const int rshift = is_openacc ? 8 : 3;
-  const int typemask = is_openacc ? 0xff : 0x7;
+  const int rshift = short_mapkind ? 8 : 3;
+  const int typemask = short_mapkind ? 0xff : 0x7;
   struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
@@ -195,7 +200,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 
   for (i = 0; i < mapnum; i++)
     {
-      int kind = get_kind (is_openacc, kinds, i);
+      int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
 	  tgt->list[i] = NULL;
@@ -226,7 +231,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	    {
 	      size_t j;
 	      for (j = i + 1; j < mapnum; j++)
-		if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+		if (!GOMP_MAP_POINTER_P (get_kind (short_mapkind, kinds, j)
 					 & typemask))
 		  break;
 		else if ((uintptr_t) hostaddrs[j] < cur_node.host_start
@@ -285,7 +290,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       for (i = 0; i < mapnum; i++)
 	if (tgt->list[i] == NULL)
 	  {
-	    int kind = get_kind (is_openacc, kinds, i);
+	    int kind = get_kind (short_mapkind, kinds, i);
 	    if (hostaddrs[i] == NULL)
 	      continue;
 	    splay_tree_key k = &array->key;
@@ -394,7 +399,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 					    k->host_end - k->host_start);
 
 		    for (j = i + 1; j < mapnum; j++)
-		      if (!GOMP_MAP_POINTER_P (get_kind (is_openacc, kinds, j)
+		      if (!GOMP_MAP_POINTER_P (get_kind (short_mapkind, kinds,
+							 j)
 					       & typemask))
 			break;
 		      else if ((uintptr_t) hostaddrs[j] < k->host_start
@@ -613,11 +619,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 
 static void
 gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
-	     size_t *sizes, void *kinds, bool is_openacc)
+	     size_t *sizes, void *kinds, bool short_mapkind)
 {
   size_t i;
   struct splay_tree_key_s cur_node;
-  const int typemask = is_openacc ? 0xff : 0x7;
+  const int typemask = short_mapkind ? 0xff : 0x7;
 
   if (!devicep)
     return;
@@ -634,7 +640,7 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
 	splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &cur_node);
 	if (n)
 	  {
-	    int kind = get_kind (is_openacc, kinds, i);
+	    int kind = get_kind (short_mapkind, kinds, i);
 	    if (n->host_start > cur_node.host_start
 		|| n->host_end < cur_node.host_end)
 	      {
@@ -931,6 +937,47 @@ gomp_fini_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = false;
 }
 
+/* Host fallback for GOMP_target{,_41} routines.  */
+
+static void
+gomp_target_fallback (void (*fn) (void *), void **hostaddrs)
+{
+  struct gomp_thread old_thr, *thr = gomp_thread ();
+  old_thr = *thr;
+  memset (thr, '\0', sizeof (*thr));
+  if (gomp_places_list)
+    {
+      thr->place = old_thr.place;
+      thr->ts.place_partition_len = gomp_places_list_len;
+    }
+  fn (hostaddrs);
+  gomp_free_thread (thr);
+  *thr = old_thr;
+}
+
+/* Helper function of GOMP_target{,_41} routines.  */
+
+static void *
+gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
+			 void (*host_fn) (void *))
+{
+  if (devicep->capabilities & GOMP_OFFLOAD_CAP_NATIVE_EXEC)
+    return (void *) host_fn;
+  else
+    {
+      gomp_mutex_lock (&devicep->lock);
+      struct splay_tree_key_s k;
+      k.host_start = (uintptr_t) host_fn;
+      k.host_end = k.host_start + 1;
+      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
+      gomp_mutex_unlock (&devicep->lock);
+      if (tgt_fn == NULL)
+	gomp_fatal ("Target function wasn't mapped");
+
+      return (void *) tgt_fn->tgt_offset;
+    }
+}
+
 /* Called when encountering a target directive.  If DEVICE
    is GOMP_DEVICE_ICV, it means use device-var ICV.  If it is
    GOMP_DEVICE_HOST_FALLBACK (or any value
@@ -950,50 +997,41 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
 
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return gomp_target_fallback (fn, hostaddrs);
+
+  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+
+  struct target_mem_desc *tgt_vars
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+		     true);
+  struct gomp_thread old_thr, *thr = gomp_thread ();
+  old_thr = *thr;
+  memset (thr, '\0', sizeof (*thr));
+  if (gomp_places_list)
     {
-      /* Host fallback.  */
-      struct gomp_thread old_thr, *thr = gomp_thread ();
-      old_thr = *thr;
-      memset (thr, '\0', sizeof (*thr));
-      if (gomp_places_list)
-	{
-	  thr->place = old_thr.place;
-	  thr->ts.place_partition_len = gomp_places_list_len;
-	}
-      fn (hostaddrs);
-      gomp_free_thread (thr);
-      *thr = old_thr;
-      return;
+      thr->place = old_thr.place;
+      thr->ts.place_partition_len = gomp_places_list_len;
     }
+  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
+  gomp_free_thread (thr);
+  *thr = old_thr;
+  gomp_unmap_vars (tgt_vars, true);
+}
 
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
-
-  void *fn_addr;
+void
+GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
+		void **hostaddrs, size_t *sizes, unsigned short *kinds)
+{
+  struct gomp_device_descr *devicep = resolve_device (device);
 
-  if (devicep->capabilities & GOMP_OFFLOAD_CAP_NATIVE_EXEC)
-    fn_addr = (void *) fn;
-  else
-    {
-      gomp_mutex_lock (&devicep->lock);
-      struct splay_tree_key_s k;
-      k.host_start = (uintptr_t) fn;
-      k.host_end = k.host_start + 1;
-      splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
-      if (tgt_fn == NULL)
-	{
-	  gomp_mutex_unlock (&devicep->lock);
-	  gomp_fatal ("Target function wasn't mapped");
-	}
-      gomp_mutex_unlock (&devicep->lock);
+  if (devicep == NULL
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return gomp_target_fallback (fn, hostaddrs);
 
-      fn_addr = (void *) tgt_fn->tgt_offset;
-    }
+  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
 
   struct target_mem_desc *tgt_vars
-    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
 		     true);
   struct gomp_thread old_thr, *thr = gomp_thread ();
   old_thr = *thr;
@@ -1009,6 +1047,25 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
   gomp_unmap_vars (tgt_vars, true);
 }
 
+/* Host fallback for GOMP_target_data{,_41} routines.  */
+
+static void
+gomp_target_data_fallback (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  if (icv->target_data)
+    {
+      /* Even when doing a host fallback, if there are any active
+         #pragma omp target data constructs, need to remember the
+         new #pragma omp target data, otherwise GOMP_target_end_data
+         would get out of sync.  */
+      struct target_mem_desc *tgt
+	= gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, false, false);
+      tgt->prev = icv->target_data;
+      icv->target_data = tgt;
+    }
+}
+
 void
 GOMP_target_data (int device, const void *unused, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned char *kinds)
@@ -1017,27 +1074,7 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
 
   if (devicep == NULL
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
-    {
-      /* Host fallback.  */
-      struct gomp_task_icv *icv = gomp_icv (false);
-      if (icv->target_data)
-	{
-	  /* Even when doing a host fallback, if there are any active
-	     #pragma omp target data constructs, need to remember the
-	     new #pragma omp target data, otherwise GOMP_target_end_data
-	     would get out of sync.  */
-	  struct target_mem_desc *tgt
-	    = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, false, false);
-	  tgt->prev = icv->target_data;
-	  icv->target_data = tgt;
-	}
-      return;
-    }
-
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
+    return gomp_target_data_fallback ();
 
   struct target_mem_desc *tgt
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
@@ -1048,6 +1085,24 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
 }
 
 void
+GOMP_target_data_41 (int device, size_t mapnum, void **hostaddrs, size_t *sizes,
+		     unsigned short *kinds)
+{
+  struct gomp_device_descr *devicep = resolve_device (device);
+
+  if (devicep == NULL
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return gomp_target_data_fallback ();
+
+  struct target_mem_desc *tgt
+    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
+		     false);
+  struct gomp_task_icv *icv = gomp_icv (true);
+  tgt->prev = icv->target_data;
+  icv->target_data = tgt;
+}
+
+void
 GOMP_target_end_data (void)
 {
   struct gomp_task_icv *icv = gomp_icv (false);
@@ -1069,15 +1124,58 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
       || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
     return;
 
-  gomp_mutex_lock (&devicep->lock);
-  if (!devicep->is_initialized)
-    gomp_init_device (devicep);
-  gomp_mutex_unlock (&devicep->lock);
-
   gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
 void
+GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
+			     size_t *sizes, unsigned short *kinds)
+{
+  struct gomp_device_descr *devicep = resolve_device (device);
+
+  if (devicep == NULL
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+    return;
+
+  /* Determine if this is an "omp target enter data".  */
+  const int typemask = 0xff;
+  bool is_enter_data = false;
+  size_t i;
+  for (i = 0; i < mapnum; i++)
+    {
+      unsigned char kind = kinds[i] & typemask;
+
+      if (kind == GOMP_MAP_POINTER || kind == GOMP_MAP_TO_PSET)
+	continue;
+
+      if (kind == GOMP_MAP_ALLOC
+	  || kind == GOMP_MAP_TO
+	  || kind == GOMP_MAP_ALWAYS_TO)
+	{
+	  is_enter_data = true;
+	  break;
+	}
+
+      if (kind == GOMP_MAP_FROM
+	  || kind == GOMP_MAP_ALWAYS_FROM
+	  || kind == GOMP_MAP_DELETE
+	  || kind == GOMP_MAP_RELEASE)
+	break;
+
+      gomp_fatal ("GOMP_target_enter_exit_data unhandled kind 0x%.2x", kind);
+    }
+
+  if (is_enter_data)
+    {
+      /* TODO  */
+    }
+  else
+    {
+      /* TODO  */
+    }
+}
+
+void
 GOMP_teams (unsigned int num_teams, unsigned int thread_limit)
 {
   if (thread_limit)


  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-15 19:54       ` Ilya Verbin
@ 2015-06-15 19:58         ` Jakub Jelinek
  2015-06-19 22:36           ` Ilya Verbin
  2015-10-13 14:50         ` Ilya Verbin
  1 sibling, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2015-06-15 19:58 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Jun 15, 2015 at 10:48:50PM +0300, Ilya Verbin wrote:
> Here is the new patch.  OK to commit?
> 
> 
> gcc/
> 	* builtin-types.def (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR): New.
> 	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR): Remove.
> 	* omp-builtins.def (BUILT_IN_GOMP_TARGET): Replace GOMP_target with
> 	GOMP_target_41.
> 	(BUILT_IN_GOMP_TARGET_DATA): Replace GOMP_target_data with
> 	GOMP_target_data_41.
> 	(BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA): New.
> 	* omp-low.c (expand_omp_target): Use
> 	BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA for GF_OMP_TARGET_KIND_ENTER_DATA
> 	and GF_OMP_TARGET_KIND_EXIT_DATA.
> 	Do not pass obsolete pointer to new builtins.
> 	(lower_omp_target): Use unsigned short for map kinds, except
> 	BUILT_IN_GOMP_TARGET_UPDATE.
> gcc/fortran/
> 	* types.def (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR): New.
> 	(BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR): Remove.
> libgomp/
> 	* libgomp.map (GOMP_4.1): Add GOMP_target_41, GOMP_target_data_41,
> 	GOMP_target_enter_exit_data.
> 	* libgomp_g.h: Declare GOMP_target_41, GOMP_target_data_41,
> 	GOMP_target_enter_exit_data.
> 	* target.c (resolve_device): Call gomp_init_device here instead of
> 	GOMP_target*.
> 	(get_kind): Rename is_openacc to short_mapkind.
> 	(gomp_map_vars): Likewise.
> 	(gomp_unmap_vars): Likewise.
> 	(gomp_update): Likewise.
> 	(gomp_target_fallback): New static function.
> 	(gomp_get_target_fn_addr): New static function.
> 	(GOMP_target): Move host fallback and fn lookup to the new functions.
> 	(GOMP_target_41): New function.
> 	(gomp_target_data_fallback): New static function.
> 	(GOMP_target_data): Move host fallback to the new function.
> 	(GOMP_target_data_41): New function.
> 	(GOMP_target_update): Do not call gomp_init_device.
> 	(GOMP_target_enter_exit_data): New function.

Ok, thanks.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-15 19:58         ` Jakub Jelinek
@ 2015-06-19 22:36           ` Ilya Verbin
  2015-06-23 11:51             ` Ilya Verbin
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Verbin @ 2015-06-19 22:36 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

Given that a mapped variable in 4.1 can have different kinds across nested data
regions, we need to store map-type not only for each var, but also for each
structured mapping.  Here is my WIP patch, is it sane? :)
Attached testcase works OK on the device with non-shared memory.


diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index f8efbdd..88623ac 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -107,6 +107,12 @@ enum gomp_map_kind
 #define GOMP_MAP_POINTER_P(X) \
   ((X) == GOMP_MAP_POINTER)
 
+#define GOMP_MAP_ALWAYS_TO_P(X) \
+  (((X) == GOMP_MAP_ALWAYS_TO) || ((X) == GOMP_MAP_ALWAYS_TOFROM))
+
+#define GOMP_MAP_ALWAYS_FROM_P(X) \
+  (((X) == GOMP_MAP_ALWAYS_FROM) || ((X) == GOMP_MAP_ALWAYS_TOFROM))
+
 
 /* Asynchronous behavior.  Keep in sync with
    libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 87d6c40..8e6d4ac 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -636,6 +636,15 @@ typedef struct splay_tree_node_s *splay_tree_node;
 typedef struct splay_tree_s *splay_tree;
 typedef struct splay_tree_key_s *splay_tree_key;
 
+struct target_var_desc {
+  /* Splay key.  */
+  splay_tree_key key;
+  /* True if data should be copied from device to host at the end.  */
+  bool copy_from;
+  /* True if data always should be copied from device to host at the end.  */
+  bool always_copy_from;
+};
+
 struct target_mem_desc {
   /* Reference count.  */
   uintptr_t refcount;
@@ -655,9 +664,9 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* List of splay keys to remove (or decrease refcount)
+  /* List of target items to remove (or decrease refcount)
      at the end of region.  */
-  splay_tree_key list[];
+  struct target_var_desc list[];
 };
 
 struct splay_tree_key_s {
@@ -673,8 +682,6 @@ struct splay_tree_key_s {
   uintptr_t refcount;
   /* Asynchronous reference count.  */
   uintptr_t async_refcount;
-  /* True if data should be copied from device to host at the end.  */
-  bool copy_from;
 };
 
 #include "splay-tree.h"
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 90d43eb..c0fcb07 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -651,7 +651,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
     }
 
   if (force_copyfrom)
-    t->list[0]->copy_from = 1;
+    t->list[0].copy_from = 1;
 
   gomp_mutex_unlock (&acc_dev->lock);
 
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index d899946..8ea3dd1 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -135,8 +135,8 @@ GOACC_parallel (int device, void (*fn) (void *),
 
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-    devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
-			    + tgt->list[i]->tgt_offset);
+    devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
+			    + tgt->list[i].key->tgt_offset);
 
   acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
 			      num_gangs, num_workers, vector_length, async,
diff --git a/libgomp/target.c b/libgomp/target.c
index fb8487a..6829ff4 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -161,6 +161,12 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep, splay_tree_key oldn,
 		  (void *) newn->host_start, (void *) newn->host_end,
 		  (void *) oldn->host_start, (void *) oldn->host_end);
     }
+
+  if (GOMP_MAP_ALWAYS_TO_P (kind))
+    devicep->host2dev_func (devicep->target_id,
+			    (void *) (oldn->tgt->tgt_start + oldn->tgt_offset),
+			    (void *) newn->host_start,
+			    newn->host_end - newn->host_start);
   oldn->refcount++;
 }
 
@@ -260,7 +266,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
-	  tgt->list[i] = NULL;
+	  tgt->list[i].key = NULL;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
@@ -271,12 +277,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
 	{
-	  tgt->list[i] = n;
+	  tgt->list[i].key = n;
+	  tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+	  tgt->list[i].always_copy_from
+	    = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 	  gomp_map_vars_existing (devicep, n, &cur_node, kind & typemask);
 	}
       else
 	{
-	  tgt->list[i] = NULL;
+	  tgt->list[i].key = NULL;
 
 	  size_t align = (size_t) 1 << (kind >> rshift);
 	  not_found_cnt++;
@@ -297,7 +306,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		  break;
 		else
 		  {
-		    tgt->list[j] = NULL;
+		    tgt->list[j].key = NULL;
 		    i++;
 		  }
 	    }
@@ -345,7 +354,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       size_t j;
 
       for (i = 0; i < mapnum; i++)
-	if (tgt->list[i] == NULL)
+	if (tgt->list[i].key == NULL)
 	  {
 	    int kind = get_kind (short_mapkind, kinds, i);
 	    if (hostaddrs[i] == NULL)
@@ -359,18 +368,23 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
 	      {
-		tgt->list[i] = n;
+		tgt->list[i].key = n;
+		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].always_copy_from
+		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 		gomp_map_vars_existing (devicep, n, k, kind & typemask);
 	      }
 	    else
 	      {
 		size_t align = (size_t) 1 << (kind >> rshift);
-		tgt->list[i] = k;
+		tgt->list[i].key = k;
 		tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
-		k->copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].always_copy_from
+		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 		k->refcount = 1;
 		k->async_refcount = 0;
 		tgt->refcount++;
@@ -388,6 +402,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		  case GOMP_MAP_TOFROM:
 		  case GOMP_MAP_FORCE_TO:
 		  case GOMP_MAP_FORCE_TOFROM:
+		  case GOMP_MAP_ALWAYS_TO:
+		  case GOMP_MAP_ALWAYS_TOFROM:
 		    /* FIXME: Perhaps add some smarts, like if copying
 		       several adjacent fields from host to target, use some
 		       host buffer to avoid sending each var individually.  */
@@ -420,7 +436,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			break;
 		      else
 			{
-			  tgt->list[j] = k;
+			  tgt->list[j].key = k;
 			  k->refcount++;
 			  gomp_map_pointer (tgt,
 					    (uintptr_t) *(void **) hostaddrs[j],
@@ -472,11 +488,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
     {
       for (i = 0; i < mapnum; i++)
 	{
-	  if (tgt->list[i] == NULL)
+	  if (tgt->list[i].key == NULL)
 	    cur_node.tgt_offset = (uintptr_t) NULL;
 	  else
-	    cur_node.tgt_offset = tgt->list[i]->tgt->tgt_start
-				  + tgt->list[i]->tgt_offset;
+	    cur_node.tgt_offset = tgt->list[i].key->tgt->tgt_start
+				  + tgt->list[i].key->tgt_offset;
 	  /* FIXME: see above FIXME comment.  */
 	  devicep->host2dev_func (devicep->target_id,
 				  (void *) (tgt->tgt_start
@@ -516,17 +532,17 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
   gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
-    if (tgt->list[i] == NULL)
+    if (tgt->list[i].key == NULL)
       ;
-    else if (tgt->list[i]->refcount > 1)
+    else if (tgt->list[i].key->refcount > 1)
       {
-	tgt->list[i]->refcount--;
-	tgt->list[i]->async_refcount++;
+	tgt->list[i].key->refcount--;
+	tgt->list[i].key->async_refcount++;
       }
     else
       {
-	splay_tree_key k = tgt->list[i];
-	if (k->copy_from)
+	splay_tree_key k = tgt->list[i].key;
+	if (tgt->list[i].copy_from)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
@@ -554,25 +570,33 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
-    if (tgt->list[i] == NULL)
-      ;
-    else if (tgt->list[i]->refcount > 1)
-      tgt->list[i]->refcount--;
-    else if (tgt->list[i]->async_refcount > 0)
-      tgt->list[i]->async_refcount--;
-    else
-      {
-	splay_tree_key k = tgt->list[i];
-	if (k->copy_from && do_copyfrom)
-	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
-				  (void *) (k->tgt->tgt_start + k->tgt_offset),
-				  k->host_end - k->host_start);
-	splay_tree_remove (&devicep->mem_map, k);
-	if (k->tgt->refcount > 1)
-	  k->tgt->refcount--;
-	else
-	  gomp_unmap_tgt (k->tgt);
-      }
+    {
+      splay_tree_key k = tgt->list[i].key;
+      if (k == NULL)
+	continue;
+
+      bool do_unmap = false;
+      if (k->refcount > 1)
+	k->refcount--;
+      else if (k->async_refcount > 0)
+	k->async_refcount--;
+      else
+	do_unmap = true;
+
+      if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
+	  || tgt->list[i].always_copy_from)
+	devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
+				(void *) (k->tgt->tgt_start + k->tgt_offset),
+				k->host_end - k->host_start);
+      if (do_unmap)
+	{
+	  splay_tree_remove (&devicep->mem_map, k);
+	  if (k->tgt->refcount > 1)
+	    k->tgt->refcount--;
+	  else
+	    gomp_unmap_tgt (k->tgt);
+	}
+    }
 
   if (tgt->refcount > 1)
     tgt->refcount--;
@@ -699,7 +723,6 @@ gomp_offload_image_to_device (struct gomp_device_descr *devicep,
       k->tgt_offset = target_table[i].start;
       k->refcount = 1;
       k->async_refcount = 0;
-      k->copy_from = false;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
@@ -725,7 +748,6 @@ gomp_offload_image_to_device (struct gomp_device_descr *devicep,
       k->tgt_offset = target_var->start;
       k->refcount = 1;
       k->async_refcount = 0;
-      k->copy_from = false;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
diff --git a/libgomp/testsuite/libgomp.c/target-11.c b/libgomp/testsuite/libgomp.c/target-11.c
new file mode 100644
index 0000000..4562d88
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-11.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target offload_device } */
+
+#include <assert.h>
+
+int main ()
+{
+  int aa = 0, bb = 0, cc = 0, dd = 0;
+
+  #pragma omp target data map(tofrom: aa) map(to: bb) map(from: cc, dd)
+    {
+      int ok;
+      aa = bb = cc = 1;
+
+      /* Set dd on target to 0 for the further check.  */
+      #pragma omp target map(always to: dd)
+	{ dd; }
+
+      dd = 1;
+      #pragma omp target map(tofrom: aa) map(always to: bb) \
+	map(always from: cc) map(to: dd) map(from: ok)
+	{
+	  /* bb is always to, aa and dd are not.  */
+	  ok = (aa == 0) && (bb == 1) && (dd == 0);
+	  aa = bb = cc = dd = 2;
+	}
+
+      assert (ok);
+      assert (aa == 1);
+      assert (bb == 1);
+      assert (cc == 2); /* cc is always from.  */
+      assert (dd == 1);
+
+      dd = 3;
+      #pragma omp target map(from: cc) map(always to: dd) map(from: ok)
+	{
+	  ok = (dd == 3); /* dd is always to.  */
+	  cc = dd = 4;
+	}
+
+      assert (ok);
+      assert (cc == 2);
+      assert (dd == 3);
+    }
+
+  assert (aa == 2);
+  assert (bb == 1);
+  assert (cc == 4);
+  assert (dd == 4);
+
+  return 0;
+}


  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-19 22:36           ` Ilya Verbin
@ 2015-06-23 11:51             ` Ilya Verbin
  2015-06-23 12:10               ` Jakub Jelinek
  2015-06-24 11:43               ` [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data Jakub Jelinek
  0 siblings, 2 replies; 49+ messages in thread
From: Ilya Verbin @ 2015-06-23 11:51 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
> Given that a mapped variable in 4.1 can have different kinds across nested data
> regions, we need to store map-type not only for each var, but also for each
> structured mapping.  Here is my WIP patch, is it sane? :)
> Attached testcase works OK on the device with non-shared memory.

A bit updated version with a fix for GOMP_MAP_TO_PSET.
make check-target-libgomp passed.


include/gcc/
	* gomp-constants.h (GOMP_MAP_ALWAYS_TO_P,
	GOMP_MAP_ALWAYS_FROM_P): Define.
libgomp/
	* libgomp.h (struct target_var_desc): New.
	(struct target_mem_desc): Replace array of splay_tree_key with array of
	target_var_desc.
	(struct splay_tree_key_s): Move copy_from to target_var_desc.
	* oacc-mem.c (gomp_acc_remove_pointer): Use copy_from from
	target_var_desc.
	* oacc-parallel.c (GOACC_parallel): Use copy_from from target_var_desc.
	* target.c (gomp_map_vars_existing): Copy data to device if map-type is
	'always to' or 'always tofrom'.
	(gomp_map_vars): Use key from target_var_desc.  Set copy_from and
	always_copy_from.
	(gomp_copy_from_async): Use key and copy_from from target_var_desc.
	(gomp_unmap_vars): Copy data from device if always_copy_from is set.
	(gomp_offload_image_to_device): Do not use copy_from.
	* testsuite/libgomp.c/target-11.c: New test.


diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 1849478..42bec04 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -107,6 +107,12 @@ enum gomp_map_kind
 #define GOMP_MAP_POINTER_P(X) \
   ((X) == GOMP_MAP_POINTER)
 
+#define GOMP_MAP_ALWAYS_TO_P(X) \
+  (((X) == GOMP_MAP_ALWAYS_TO) || ((X) == GOMP_MAP_ALWAYS_TOFROM))
+
+#define GOMP_MAP_ALWAYS_FROM_P(X) \
+  (((X) == GOMP_MAP_ALWAYS_FROM) || ((X) == GOMP_MAP_ALWAYS_TOFROM))
+
 
 /* Asynchronous behavior.  Keep in sync with
    libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 87d6c40..8e6d4ac 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -636,6 +636,15 @@ typedef struct splay_tree_node_s *splay_tree_node;
 typedef struct splay_tree_s *splay_tree;
 typedef struct splay_tree_key_s *splay_tree_key;
 
+struct target_var_desc {
+  /* Splay key.  */
+  splay_tree_key key;
+  /* True if data should be copied from device to host at the end.  */
+  bool copy_from;
+  /* True if data always should be copied from device to host at the end.  */
+  bool always_copy_from;
+};
+
 struct target_mem_desc {
   /* Reference count.  */
   uintptr_t refcount;
@@ -655,9 +664,9 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* List of splay keys to remove (or decrease refcount)
+  /* List of target items to remove (or decrease refcount)
      at the end of region.  */
-  splay_tree_key list[];
+  struct target_var_desc list[];
 };
 
 struct splay_tree_key_s {
@@ -673,8 +682,6 @@ struct splay_tree_key_s {
   uintptr_t refcount;
   /* Asynchronous reference count.  */
   uintptr_t async_refcount;
-  /* True if data should be copied from device to host at the end.  */
-  bool copy_from;
 };
 
 #include "splay-tree.h"
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 90d43eb..c0fcb07 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -651,7 +651,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
     }
 
   if (force_copyfrom)
-    t->list[0]->copy_from = 1;
+    t->list[0].copy_from = 1;
 
   gomp_mutex_unlock (&acc_dev->lock);
 
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index d899946..8ea3dd1 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -135,8 +135,8 @@ GOACC_parallel (int device, void (*fn) (void *),
 
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-    devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
-			    + tgt->list[i]->tgt_offset);
+    devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
+			    + tgt->list[i].key->tgt_offset);
 
   acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
 			      num_gangs, num_workers, vector_length, async,
diff --git a/libgomp/target.c b/libgomp/target.c
index fb8487a..b1640c1 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -161,6 +161,12 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep, splay_tree_key oldn,
 		  (void *) newn->host_start, (void *) newn->host_end,
 		  (void *) oldn->host_start, (void *) oldn->host_end);
     }
+
+  if (GOMP_MAP_ALWAYS_TO_P (kind))
+    devicep->host2dev_func (devicep->target_id,
+			    (void *) (oldn->tgt->tgt_start + oldn->tgt_offset),
+			    (void *) newn->host_start,
+			    newn->host_end - newn->host_start);
   oldn->refcount++;
 }
 
@@ -260,7 +266,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
-	  tgt->list[i] = NULL;
+	  tgt->list[i].key = NULL;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
@@ -271,12 +277,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
 	{
-	  tgt->list[i] = n;
+	  tgt->list[i].key = n;
+	  tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+	  tgt->list[i].always_copy_from
+	    = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 	  gomp_map_vars_existing (devicep, n, &cur_node, kind & typemask);
 	}
       else
 	{
-	  tgt->list[i] = NULL;
+	  tgt->list[i].key = NULL;
 
 	  size_t align = (size_t) 1 << (kind >> rshift);
 	  not_found_cnt++;
@@ -297,7 +306,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		  break;
 		else
 		  {
-		    tgt->list[j] = NULL;
+		    tgt->list[j].key = NULL;
 		    i++;
 		  }
 	    }
@@ -345,7 +354,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       size_t j;
 
       for (i = 0; i < mapnum; i++)
-	if (tgt->list[i] == NULL)
+	if (tgt->list[i].key == NULL)
 	  {
 	    int kind = get_kind (short_mapkind, kinds, i);
 	    if (hostaddrs[i] == NULL)
@@ -359,18 +368,23 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
 	      {
-		tgt->list[i] = n;
+		tgt->list[i].key = n;
+		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].always_copy_from
+		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 		gomp_map_vars_existing (devicep, n, k, kind & typemask);
 	      }
 	    else
 	      {
 		size_t align = (size_t) 1 << (kind >> rshift);
-		tgt->list[i] = k;
+		tgt->list[i].key = k;
 		tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
-		k->copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].always_copy_from
+		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 		k->refcount = 1;
 		k->async_refcount = 0;
 		tgt->refcount++;
@@ -388,6 +402,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		  case GOMP_MAP_TOFROM:
 		  case GOMP_MAP_FORCE_TO:
 		  case GOMP_MAP_FORCE_TOFROM:
+		  case GOMP_MAP_ALWAYS_TO:
+		  case GOMP_MAP_ALWAYS_TOFROM:
 		    /* FIXME: Perhaps add some smarts, like if copying
 		       several adjacent fields from host to target, use some
 		       host buffer to avoid sending each var individually.  */
@@ -420,7 +436,9 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			break;
 		      else
 			{
-			  tgt->list[j] = k;
+			  tgt->list[j].key = k;
+			  tgt->list[j].copy_from = false;
+			  tgt->list[j].always_copy_from = false;
 			  k->refcount++;
 			  gomp_map_pointer (tgt,
 					    (uintptr_t) *(void **) hostaddrs[j],
@@ -472,11 +490,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
     {
       for (i = 0; i < mapnum; i++)
 	{
-	  if (tgt->list[i] == NULL)
+	  if (tgt->list[i].key == NULL)
 	    cur_node.tgt_offset = (uintptr_t) NULL;
 	  else
-	    cur_node.tgt_offset = tgt->list[i]->tgt->tgt_start
-				  + tgt->list[i]->tgt_offset;
+	    cur_node.tgt_offset = tgt->list[i].key->tgt->tgt_start
+				  + tgt->list[i].key->tgt_offset;
 	  /* FIXME: see above FIXME comment.  */
 	  devicep->host2dev_func (devicep->target_id,
 				  (void *) (tgt->tgt_start
@@ -516,17 +534,17 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
   gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
-    if (tgt->list[i] == NULL)
+    if (tgt->list[i].key == NULL)
       ;
-    else if (tgt->list[i]->refcount > 1)
+    else if (tgt->list[i].key->refcount > 1)
       {
-	tgt->list[i]->refcount--;
-	tgt->list[i]->async_refcount++;
+	tgt->list[i].key->refcount--;
+	tgt->list[i].key->async_refcount++;
       }
     else
       {
-	splay_tree_key k = tgt->list[i];
-	if (k->copy_from)
+	splay_tree_key k = tgt->list[i].key;
+	if (tgt->list[i].copy_from)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
@@ -554,25 +572,33 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
-    if (tgt->list[i] == NULL)
-      ;
-    else if (tgt->list[i]->refcount > 1)
-      tgt->list[i]->refcount--;
-    else if (tgt->list[i]->async_refcount > 0)
-      tgt->list[i]->async_refcount--;
-    else
-      {
-	splay_tree_key k = tgt->list[i];
-	if (k->copy_from && do_copyfrom)
-	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
-				  (void *) (k->tgt->tgt_start + k->tgt_offset),
-				  k->host_end - k->host_start);
-	splay_tree_remove (&devicep->mem_map, k);
-	if (k->tgt->refcount > 1)
-	  k->tgt->refcount--;
-	else
-	  gomp_unmap_tgt (k->tgt);
-      }
+    {
+      splay_tree_key k = tgt->list[i].key;
+      if (k == NULL)
+	continue;
+
+      bool do_unmap = false;
+      if (k->refcount > 1)
+	k->refcount--;
+      else if (k->async_refcount > 0)
+	k->async_refcount--;
+      else
+	do_unmap = true;
+
+      if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
+	  || tgt->list[i].always_copy_from)
+	devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
+				(void *) (k->tgt->tgt_start + k->tgt_offset),
+				k->host_end - k->host_start);
+      if (do_unmap)
+	{
+	  splay_tree_remove (&devicep->mem_map, k);
+	  if (k->tgt->refcount > 1)
+	    k->tgt->refcount--;
+	  else
+	    gomp_unmap_tgt (k->tgt);
+	}
+    }
 
   if (tgt->refcount > 1)
     tgt->refcount--;
@@ -699,7 +725,6 @@ gomp_offload_image_to_device (struct gomp_device_descr *devicep,
       k->tgt_offset = target_table[i].start;
       k->refcount = 1;
       k->async_refcount = 0;
-      k->copy_from = false;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
@@ -725,7 +750,6 @@ gomp_offload_image_to_device (struct gomp_device_descr *devicep,
       k->tgt_offset = target_var->start;
       k->refcount = 1;
       k->async_refcount = 0;
-      k->copy_from = false;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
diff --git a/libgomp/testsuite/libgomp.c/target-11.c b/libgomp/testsuite/libgomp.c/target-11.c
new file mode 100644
index 0000000..4562d88
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-11.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target offload_device } */
+
+#include <assert.h>
+
+int main ()
+{
+  int aa = 0, bb = 0, cc = 0, dd = 0;
+
+  #pragma omp target data map(tofrom: aa) map(to: bb) map(from: cc, dd)
+    {
+      int ok;
+      aa = bb = cc = 1;
+
+      /* Set dd on target to 0 for the further check.  */
+      #pragma omp target map(always to: dd)
+	{ dd; }
+
+      dd = 1;
+      #pragma omp target map(tofrom: aa) map(always to: bb) \
+	map(always from: cc) map(to: dd) map(from: ok)
+	{
+	  /* bb is always to, aa and dd are not.  */
+	  ok = (aa == 0) && (bb == 1) && (dd == 0);
+	  aa = bb = cc = dd = 2;
+	}
+
+      assert (ok);
+      assert (aa == 1);
+      assert (bb == 1);
+      assert (cc == 2); /* cc is always from.  */
+      assert (dd == 1);
+
+      dd = 3;
+      #pragma omp target map(from: cc) map(always to: dd) map(from: ok)
+	{
+	  ok = (dd == 3); /* dd is always to.  */
+	  cc = dd = 4;
+	}
+
+      assert (ok);
+      assert (cc == 2);
+      assert (dd == 3);
+    }
+
+  assert (aa == 2);
+  assert (bb == 1);
+  assert (cc == 4);
+  assert (dd == 4);
+
+  return 0;
+}


  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-23 11:51             ` Ilya Verbin
@ 2015-06-23 12:10               ` Jakub Jelinek
  2015-06-23 14:55                 ` Ilya Verbin
  2015-10-19 16:33                 ` OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data) Thomas Schwinge
  2015-06-24 11:43               ` [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data Jakub Jelinek
  1 sibling, 2 replies; 49+ messages in thread
From: Jakub Jelinek @ 2015-06-23 12:10 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote:
> On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
> > Given that a mapped variable in 4.1 can have different kinds across nested data
> > regions, we need to store map-type not only for each var, but also for each
> > structured mapping.  Here is my WIP patch, is it sane? :)
> > Attached testcase works OK on the device with non-shared memory.
> 
> A bit updated version with a fix for GOMP_MAP_TO_PSET.
> make check-target-libgomp passed.

Ok, thanks.

> include/gcc/
> 	* gomp-constants.h (GOMP_MAP_ALWAYS_TO_P,
> 	GOMP_MAP_ALWAYS_FROM_P): Define.
> libgomp/
> 	* libgomp.h (struct target_var_desc): New.
> 	(struct target_mem_desc): Replace array of splay_tree_key with array of
> 	target_var_desc.
> 	(struct splay_tree_key_s): Move copy_from to target_var_desc.
> 	* oacc-mem.c (gomp_acc_remove_pointer): Use copy_from from
> 	target_var_desc.
> 	* oacc-parallel.c (GOACC_parallel): Use copy_from from target_var_desc.
> 	* target.c (gomp_map_vars_existing): Copy data to device if map-type is
> 	'always to' or 'always tofrom'.
> 	(gomp_map_vars): Use key from target_var_desc.  Set copy_from and
> 	always_copy_from.
> 	(gomp_copy_from_async): Use key and copy_from from target_var_desc.
> 	(gomp_unmap_vars): Copy data from device if always_copy_from is set.
> 	(gomp_offload_image_to_device): Do not use copy_from.
> 	* testsuite/libgomp.c/target-11.c: New test.

> +      /* Set dd on target to 0 for the further check.  */
> +      #pragma omp target map(always to: dd)
> +	{ dd; }

This reminds me that:
          if (ctx->region_type == ORT_TARGET && !(n->value & GOVD_SEEN))
            remove = true;
in gimplify.c is not what we want, if it is has GOMP_MAP_KIND_ALWAYS,
then we shouldn't remove it even when it is not mentioned inside of the
region's body, because it then has side-effects.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-23 12:10               ` Jakub Jelinek
@ 2015-06-23 14:55                 ` Ilya Verbin
  2015-06-23 15:04                   ` Jakub Jelinek
  2015-10-19 16:33                 ` OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data) Thomas Schwinge
  1 sibling, 1 reply; 49+ messages in thread
From: Ilya Verbin @ 2015-06-23 14:55 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Tue, Jun 23, 2015 at 13:51:39 +0200, Jakub Jelinek wrote:
> > +      /* Set dd on target to 0 for the further check.  */
> > +      #pragma omp target map(always to: dd)
> > +	{ dd; }
> 
> This reminds me that:
>           if (ctx->region_type == ORT_TARGET && !(n->value & GOVD_SEEN))
>             remove = true;
> in gimplify.c is not what we want, if it is has GOMP_MAP_KIND_ALWAYS,
> then we shouldn't remove it even when it is not mentioned inside of the
> region's body, because it then has side-effects.

OK for gomp-4_1-branch?


gcc/
	* gimplify.c (gimplify_adjust_omp_clauses): Don't remove map clause if
	it has map-type-modifier always.
libgomp/
	* testsuite/libgomp.c/target-11.c (main): Remove dd from target region.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9b2347a..74fe60b 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6870,7 +6870,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p)
 	  if (!DECL_P (decl))
 	    break;
 	  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
-	  if (ctx->region_type == ORT_TARGET && !(n->value & GOVD_SEEN))
+	  if (ctx->region_type == ORT_TARGET && !(n->value & GOVD_SEEN)
+	      && !(OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS))
 	    remove = true;
 	  else if (DECL_SIZE (decl)
 		   && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST
diff --git a/libgomp/testsuite/libgomp.c/target-11.c b/libgomp/testsuite/libgomp.c/target-11.c
index 4562d88..0fd183b 100644
--- a/libgomp/testsuite/libgomp.c/target-11.c
+++ b/libgomp/testsuite/libgomp.c/target-11.c
@@ -13,7 +13,7 @@ int main ()
 
       /* Set dd on target to 0 for the further check.  */
       #pragma omp target map(always to: dd)
-	{ dd; }
+	;
 
       dd = 1;
       #pragma omp target map(tofrom: aa) map(always to: bb) \


  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-23 14:55                 ` Ilya Verbin
@ 2015-06-23 15:04                   ` Jakub Jelinek
  0 siblings, 0 replies; 49+ messages in thread
From: Jakub Jelinek @ 2015-06-23 15:04 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Tue, Jun 23, 2015 at 05:54:48PM +0300, Ilya Verbin wrote:
> On Tue, Jun 23, 2015 at 13:51:39 +0200, Jakub Jelinek wrote:
> > > +      /* Set dd on target to 0 for the further check.  */
> > > +      #pragma omp target map(always to: dd)
> > > +	{ dd; }
> > 
> > This reminds me that:
> >           if (ctx->region_type == ORT_TARGET && !(n->value & GOVD_SEEN))
> >             remove = true;
> > in gimplify.c is not what we want, if it is has GOMP_MAP_KIND_ALWAYS,
> > then we shouldn't remove it even when it is not mentioned inside of the
> > region's body, because it then has side-effects.
> 
> OK for gomp-4_1-branch?
> 
> 
> gcc/
> 	* gimplify.c (gimplify_adjust_omp_clauses): Don't remove map clause if
> 	it has map-type-modifier always.
> libgomp/
> 	* testsuite/libgomp.c/target-11.c (main): Remove dd from target region.

GOMP_MAP_RELEASE uses the GOMP_MAP_FLAG_ALWAYS for something different from
always, because always release and always delete is not meaningful.
But as neither release nor delete can appear on map clause in target region,
it doesn't matter (at least for now).
So the patch is ok, thanks.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-23 11:51             ` Ilya Verbin
  2015-06-23 12:10               ` Jakub Jelinek
@ 2015-06-24 11:43               ` Jakub Jelinek
  2015-06-24 20:14                 ` Ilya Verbin
  1 sibling, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2015-06-24 11:43 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote:
> On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
> > Given that a mapped variable in 4.1 can have different kinds across nested data
> > regions, we need to store map-type not only for each var, but also for each
> > structured mapping.  Here is my WIP patch, is it sane? :)
> > Attached testcase works OK on the device with non-shared memory.
> 
> A bit updated version with a fix for GOMP_MAP_TO_PSET.
> make check-target-libgomp passed.

Thinking about this more, for always modifier this isn't really sufficient.
Consider:
void
foo (int *p)
{
  #pragma omp target data (alloc:p[0:32])
  {
    #pragma omp target data (always, from:p[7:9])
    {
      ...
    }
  }
}
If all we record is the corresponding splay_tree and the flags
(from/always_from), then this would try to copy from the device
the whole array section, rather than just the small portion of it.
So, supposedly in addition to the splay_tree for always from case we also
need to remember e.g. [relative offset, length] within the splay tree
object.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-24 11:43               ` [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data Jakub Jelinek
@ 2015-06-24 20:14                 ` Ilya Verbin
  2015-06-24 20:21                   ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Ilya Verbin @ 2015-06-24 20:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, Jun 24, 2015 at 13:39:03 +0200, Jakub Jelinek wrote:
> Thinking about this more, for always modifier this isn't really sufficient.
> Consider:
> void
> foo (int *p)
> {
>   #pragma omp target data (alloc:p[0:32])
>   {
>     #pragma omp target data (always, from:p[7:9])
>     {
>       ...
>     }
>   }
> }
> If all we record is the corresponding splay_tree and the flags
> (from/always_from), then this would try to copy from the device
> the whole array section, rather than just the small portion of it.
> So, supposedly in addition to the splay_tree for always from case we also
> need to remember e.g. [relative offset, length] within the splay tree
> object.

Indeed, here is the fix, make check-target-libgomp passed.


libgomp/
	* libgomp.h (struct target_var_desc): Add offset and length.
	* target.c (gomp_map_vars_existing): New argument tgt_var, fill it.
	(gomp_map_vars): Move filling of tgt->list[i] into
	gomp_map_vars_existing.  Add missed case GOMP_MAP_ALWAYS_FROM.
	(gomp_unmap_vars): Add list[i].offset to host and target addresses,
	use list[i].length instead of k->host_end - k->host_start.
	* testsuite/libgomp.c/target-11.c: Extend for testing array sections.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index bd17828..c48e708 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -644,6 +644,12 @@ struct target_var_desc {
   bool copy_from;
   /* True if data always should be copied from device to host at the end.  */
   bool always_copy_from;
+  /* Used for unmapping of array sections, can be nonzero only when
+     always_copy_from is true.  */
+  uintptr_t offset;
+  /* Used for unmapping of array sections, can be less than the size of the
+     whole object only when always_copy_from is true.  */
+  uintptr_t length;
 };
 
 struct target_mem_desc {
diff --git a/libgomp/target.c b/libgomp/target.c
index b1640c1..a394e95 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -149,8 +149,15 @@ resolve_device (int device_id)
 
 static inline void
 gomp_map_vars_existing (struct gomp_device_descr *devicep, splay_tree_key oldn,
-			splay_tree_key newn, unsigned char kind)
+			splay_tree_key newn, struct target_var_desc *tgt_var,
+			unsigned char kind)
 {
+  tgt_var->key = oldn;
+  tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);
+  tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);
+  tgt_var->offset = newn->host_start - oldn->host_start;
+  tgt_var->length = newn->host_end - newn->host_start;
+
   if ((kind & GOMP_MAP_FLAG_FORCE)
       || oldn->host_start > newn->host_start
       || oldn->host_end < newn->host_end)
@@ -276,13 +283,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
       splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
-	{
-	  tgt->list[i].key = n;
-	  tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
-	  tgt->list[i].always_copy_from
-	    = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
-	  gomp_map_vars_existing (devicep, n, &cur_node, kind & typemask);
-	}
+	gomp_map_vars_existing (devicep, n, &cur_node, &tgt->list[i],
+				kind & typemask);
       else
 	{
 	  tgt->list[i].key = NULL;
@@ -367,13 +369,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	      k->host_end = k->host_start + sizeof (void *);
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
-	      {
-		tgt->list[i].key = n;
-		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
-		tgt->list[i].always_copy_from
-		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
-		gomp_map_vars_existing (devicep, n, k, kind & typemask);
-	      }
+	      gomp_map_vars_existing (devicep, n, k, &tgt->list[i],
+				      kind & typemask);
 	    else
 	      {
 		size_t align = (size_t) 1 << (kind >> rshift);
@@ -385,6 +382,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
 		tgt->list[i].always_copy_from
 		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
+		tgt->list[i].offset = 0;
+		tgt->list[i].length = k->host_end - k->host_start;
 		k->refcount = 1;
 		k->async_refcount = 0;
 		tgt->refcount++;
@@ -397,6 +396,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		  case GOMP_MAP_FROM:
 		  case GOMP_MAP_FORCE_ALLOC:
 		  case GOMP_MAP_FORCE_FROM:
+		  case GOMP_MAP_ALWAYS_FROM:
 		    break;
 		  case GOMP_MAP_TO:
 		  case GOMP_MAP_TOFROM:
@@ -587,9 +587,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 
       if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
 	  || tgt->list[i].always_copy_from)
-	devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
-				(void *) (k->tgt->tgt_start + k->tgt_offset),
-				k->host_end - k->host_start);
+	devicep->dev2host_func (devicep->target_id,
+				(void *) (k->host_start + tgt->list[i].offset),
+				(void *) (k->tgt->tgt_start + k->tgt_offset
+					  + tgt->list[i].offset),
+				tgt->list[i].length);
       if (do_unmap)
 	{
 	  splay_tree_remove (&devicep->mem_map, k);
diff --git a/libgomp/testsuite/libgomp.c/target-11.c b/libgomp/testsuite/libgomp.c/target-11.c
index 0fd183b..b86097a 100644
--- a/libgomp/testsuite/libgomp.c/target-11.c
+++ b/libgomp/testsuite/libgomp.c/target-11.c
@@ -1,7 +1,20 @@
 /* { dg-require-effective-target offload_device } */
 
+#include <stdlib.h>
 #include <assert.h>
 
+#define N 32
+
+void test_array_section (int *p)
+{
+  #pragma omp target data map(alloc: p[0:N])
+    {
+      #pragma omp target map(always from:p[7:9])
+	for (int i = 0; i < N; i++)
+	  p[i] = i;
+    }
+}
+
 int main ()
 {
   int aa = 0, bb = 0, cc = 0, dd = 0;
@@ -47,5 +60,16 @@ int main ()
   assert (cc == 4);
   assert (dd == 4);
 
+  int *array = calloc (N, sizeof (int));
+  test_array_section (array);
+
+  for (int i = 0; i < 7; i++)
+    assert (array[i] == 0);
+  for (int i = 7; i < 7 + 9; i++)
+    assert (array[i] == i);
+  for (int i = 7 + 9; i < N; i++)
+    assert (array[i] == 0);
+
+  free (array);
   return 0;
 }


  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-24 20:14                 ` Ilya Verbin
@ 2015-06-24 20:21                   ` Jakub Jelinek
  0 siblings, 0 replies; 49+ messages in thread
From: Jakub Jelinek @ 2015-06-24 20:21 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Wed, Jun 24, 2015 at 11:11:12PM +0300, Ilya Verbin wrote:
> Indeed, here is the fix, make check-target-libgomp passed.
> 
> 
> libgomp/
> 	* libgomp.h (struct target_var_desc): Add offset and length.
> 	* target.c (gomp_map_vars_existing): New argument tgt_var, fill it.
> 	(gomp_map_vars): Move filling of tgt->list[i] into
> 	gomp_map_vars_existing.  Add missed case GOMP_MAP_ALWAYS_FROM.
> 	(gomp_unmap_vars): Add list[i].offset to host and target addresses,
> 	use list[i].length instead of k->host_end - k->host_start.
> 	* testsuite/libgomp.c/target-11.c: Extend for testing array sections.

Ok, thanks.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-06-15 19:54       ` Ilya Verbin
  2015-06-15 19:58         ` Jakub Jelinek
@ 2015-10-13 14:50         ` Ilya Verbin
  2015-10-13 19:27           ` Jakub Jelinek
  1 sibling, 1 reply; 49+ messages in thread
From: Ilya Verbin @ 2015-10-13 14:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Mon, Jun 15, 2015 at 22:48:50 +0300, Ilya Verbin wrote:
> @@ -950,50 +997,41 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
> ...
> +  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);

If mapnum is 0, tgt_vars->tgt_start is uninitialized.  This is not a big bug,
because in this case the target function doesn't use this pointer, however
valgrind warns about sending uninitialized data to target.
OK for gomp-4_1-branch?


libgomp/
	* target.c (gomp_map_vars): Zero tgt->tgt_start when mapnum is 0.


diff --git a/libgomp/target.c b/libgomp/target.c
index 95360d1..c4e3323 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -323,6 +323,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
+  tgt->tgt_start = 0;
   tgt->list_count = mapnum;
   tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
   tgt->device_descr = devicep;


  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data
  2015-10-13 14:50         ` Ilya Verbin
@ 2015-10-13 19:27           ` Jakub Jelinek
  0 siblings, 0 replies; 49+ messages in thread
From: Jakub Jelinek @ 2015-10-13 19:27 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Kirill Yukhin

On Tue, Oct 13, 2015 at 05:48:11PM +0300, Ilya Verbin wrote:
> On Mon, Jun 15, 2015 at 22:48:50 +0300, Ilya Verbin wrote:
> > @@ -950,50 +997,41 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
> > ...
> > +  devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
> 
> If mapnum is 0, tgt_vars->tgt_start is uninitialized.  This is not a big bug,
> because in this case the target function doesn't use this pointer, however
> valgrind warns about sending uninitialized data to target.
> OK for gomp-4_1-branch?
> 
> 
> libgomp/
> 	* target.c (gomp_map_vars): Zero tgt->tgt_start when mapnum is 0.

gomp-4_1-branch is frozen.  I'd prefer to initialize tgt_start and tgt_end
to 0 just in the
  if (mapnum == 0)
    return tgt;
case.  With that change it is ok for trunk.

> diff --git a/libgomp/target.c b/libgomp/target.c
> index 95360d1..c4e3323 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -323,6 +323,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
>    struct splay_tree_key_s cur_node;
>    struct target_mem_desc *tgt
>      = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
> +  tgt->tgt_start = 0;
>    tgt->list_count = mapnum;
>    tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
>    tgt->device_descr = devicep;

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)
  2015-06-23 12:10               ` Jakub Jelinek
  2015-06-23 14:55                 ` Ilya Verbin
@ 2015-10-19 16:33                 ` Thomas Schwinge
  2015-10-19 16:48                   ` Ilya Verbin
                                     ` (2 more replies)
  1 sibling, 3 replies; 49+ messages in thread
From: Thomas Schwinge @ 2015-10-19 16:33 UTC (permalink / raw)
  To: Chung-Lin Tang; +Cc: gcc-patches, Kirill Yukhin, Jakub Jelinek, Ilya Verbin


[-- Attachment #1.1: Type: text/plain, Size: 5227 bytes --]

Hi!

Chung-Lin, would you please have a look at the following (on
gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?

On Tue, 23 Jun 2015 13:51:39 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote:
> > On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
> > > Given that a mapped variable in 4.1 can have different kinds across nested data
> > > regions, we need to store map-type not only for each var, but also for each
> > > structured mapping.  Here is my WIP patch, is it sane? :)
> > > Attached testcase works OK on the device with non-shared memory.
> > 
> > A bit updated version with a fix for GOMP_MAP_TO_PSET.
> > make check-target-libgomp passed.
> 
> Ok, thanks.
> 
> > include/gcc/
> > 	* gomp-constants.h (GOMP_MAP_ALWAYS_TO_P,
> > 	GOMP_MAP_ALWAYS_FROM_P): Define.
> > libgomp/
> > 	* libgomp.h (struct target_var_desc): New.
> > 	(struct target_mem_desc): Replace array of splay_tree_key with array of
> > 	target_var_desc.
> > 	(struct splay_tree_key_s): Move copy_from to target_var_desc.
> > 	* oacc-mem.c (gomp_acc_remove_pointer): Use copy_from from
> > 	target_var_desc.
> > 	* oacc-parallel.c (GOACC_parallel): Use copy_from from target_var_desc.
> > 	* target.c (gomp_map_vars_existing): Copy data to device if map-type is
> > 	'always to' or 'always tofrom'.
> > 	(gomp_map_vars): Use key from target_var_desc.  Set copy_from and
> > 	always_copy_from.
> > 	(gomp_copy_from_async): Use key and copy_from from target_var_desc.
> > 	(gomp_unmap_vars): Copy data from device if always_copy_from is set.
> > 	(gomp_offload_image_to_device): Do not use copy_from.
> > 	* testsuite/libgomp.c/target-11.c: New test.

(That's gomp-4_1-branch r224838.  The attached
gomp-4_1-branch-r224838.patch is a variant that applies on top of
gomp-4_0-branch r228972.)  This change introduces regressions in OpenACC
async clause handling.

Testing on gomp-4_1-branch r224838:

    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

Same for C++.

Testing on gomp-4_0-branch r228972 plus the attached
gomp-4_1-branch-r224838.patch:

    PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none execution test

Same for C++.

As I mentioned in
<http://news.gmane.org/find-root.php?message_id=%3C87lhb3b11q.fsf%40kepler.schwinge.homeip.net%3E>,
all three regressions are visible when testing on trunk r228777.  I have
not analyzed why the three different branches show different sets of
regressions -- I'm hoping they're all manifestations of the same
underlying problem: they're all using the OpenACC async clause.

Looking at gomp-4_0-branch r228972 plus the attached
gomp-4_1-branch-r224838.patch, clearly there is "some kind of data
corruption":

    $ gdb -q a.out 
    Reading symbols from a.out...done.
    (gdb) start
    [...]
    25          a = (float *) malloc (nbytes);
    (gdb) n
    26          b = (float *) malloc (nbytes);
    (gdb) print a
    $1 = (float *) 0xab12c0
    (gdb) c
    Continuing.
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00000000004015d2 in main (argc=1, argv=0x7fffffffd408) at source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c:133
    133             if (a[i] != 3.0)
    (gdb) print a
    $2 = (float *) 0x500680620

0x500680620 looks like a nvptx device pointer to me, which is a) wrong
(after the "malloc", "a" shouldn't change its value throughout program
execution), and b) that "explains" the segmentation fault (device pointer
dereferenced in host code).

So, maybe data is erroneously being copied back to the host from device,
or from libgomp internal data structures.  Maybe some copy_from flag
handling needs to be adjusted or added in the OpenACC code in libgomp?


I have no idea whether that's related, but I noticed that currently we're
not in any way handling async_refcount in libgomp/oacc-*.c -- do we have
to?  (Its name certainly makes me believe it's related to asynchronous
data (un-)mapping.)  Should we be able to drop some of the
OpenACC-specific async implementation in libgomp, and use new/generic
target.c code instead?


Please note that there will be further libgomp changes (target.c, and
other files) coming in later merges from gomp-4_1-branch, so please for
now just work on identifying/resolving the regression, and let any code
refactoring wait for later.


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: gomp-4_1-branch-r224838.patch --]
[-- Type: text/x-diff, Size: 11797 bytes --]

diff --git include/gomp-constants.h include/gomp-constants.h
index b55f68b..540a31e 100644
--- include/gomp-constants.h
+++ include/gomp-constants.h
@@ -111,6 +111,12 @@ enum gomp_map_kind
 #define GOMP_MAP_POINTER_P(X) \
   ((X) == GOMP_MAP_POINTER)
 
+#define GOMP_MAP_ALWAYS_TO_P(X) \
+  (((X) == GOMP_MAP_ALWAYS_TO) || ((X) == GOMP_MAP_ALWAYS_TOFROM))
+
+#define GOMP_MAP_ALWAYS_FROM_P(X) \
+  (((X) == GOMP_MAP_ALWAYS_FROM) || ((X) == GOMP_MAP_ALWAYS_TOFROM))
+
 
 /* Asynchronous behavior.  Keep in sync with
    libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
diff --git libgomp/libgomp.h libgomp/libgomp.h
index d86da7d..8fd7d08 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -641,6 +641,15 @@ typedef struct splay_tree_node_s *splay_tree_node;
 typedef struct splay_tree_s *splay_tree;
 typedef struct splay_tree_key_s *splay_tree_key;
 
+struct target_var_desc {
+  /* Splay key.  */
+  splay_tree_key key;
+  /* True if data should be copied from device to host at the end.  */
+  bool copy_from;
+  /* True if data always should be copied from device to host at the end.  */
+  bool always_copy_from;
+};
+
 struct target_mem_desc {
   /* Reference count.  */
   uintptr_t refcount;
@@ -660,9 +669,9 @@ struct target_mem_desc {
   /* Corresponding target device descriptor.  */
   struct gomp_device_descr *device_descr;
 
-  /* List of splay keys to remove (or decrease refcount)
+  /* List of target items to remove (or decrease refcount)
      at the end of region.  */
-  splay_tree_key list[];
+  struct target_var_desc list[];
 };
 
 struct splay_tree_key_s {
@@ -678,8 +687,6 @@ struct splay_tree_key_s {
   uintptr_t refcount;
   /* Asynchronous reference count.  */
   uintptr_t async_refcount;
-  /* True if data should be copied from device to host at the end.  */
-  bool copy_from;
 };
 
 #include "splay-tree.h"
diff --git libgomp/oacc-mem.c libgomp/oacc-mem.c
index 7fcf199..a90c912 100644
--- libgomp/oacc-mem.c
+++ libgomp/oacc-mem.c
@@ -685,7 +685,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyfrom, int async, int mapnum)
 	}
     }
 
-  t->list[0]->copy_from = force_copyfrom ? 1 : 0;
+  t->list[0].copy_from = force_copyfrom ? 1 : 0;
 
   gomp_mutex_unlock (&acc_dev->lock);
 
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 2b90c9f..e4ecc87 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -262,9 +262,9 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
     {
-      if (tgt->list[i] != NULL)
-	devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
-				+ tgt->list[i]->tgt_offset);
+      if (tgt->list[i].key != NULL)
+	devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
+				+ tgt->list[i].key->tgt_offset);
       else
 	devaddrs[i] = NULL;
     }
diff --git libgomp/target.c libgomp/target.c
index 4587361..c2e1996 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -168,6 +168,12 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep, splay_tree_key oldn,
 		  (void *) newn->host_start, (void *) newn->host_end,
 		  (void *) oldn->host_start, (void *) oldn->host_end);
     }
+
+  if (GOMP_MAP_ALWAYS_TO_P (kind))
+    devicep->host2dev_func (devicep->target_id,
+			    (void *) (oldn->tgt->tgt_start + oldn->tgt_offset),
+			    (void *) newn->host_start,
+			    newn->host_end - newn->host_start);
   oldn->refcount++;
 }
 
@@ -267,7 +273,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL)
 	{
-	  tgt->list[i] = NULL;
+	  tgt->list[i].key = NULL;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
@@ -278,12 +284,15 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
       if (n)
 	{
-	  tgt->list[i] = n;
+	  tgt->list[i].key = n;
+	  tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+	  tgt->list[i].always_copy_from
+	    = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 	  gomp_map_vars_existing (devicep, n, &cur_node, kind & typemask);
 	}
       else
 	{
-	  tgt->list[i] = NULL;
+	  tgt->list[i].key = NULL;
 
 	  size_t align = (size_t) 1 << (kind >> rshift);
 	  not_found_cnt++;
@@ -304,7 +313,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		  break;
 		else
 		  {
-		    tgt->list[j] = NULL;
+		    tgt->list[j].key = NULL;
 		    i++;
 		  }
 	    }
@@ -352,7 +361,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       size_t j;
 
       for (i = 0; i < mapnum; i++)
-	if (tgt->list[i] == NULL)
+	if (tgt->list[i].key == NULL)
 	  {
 	    int kind = get_kind (short_mapkind, kinds, i);
 	    if (hostaddrs[i] == NULL)
@@ -366,18 +375,23 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n)
 	      {
-		tgt->list[i] = n;
+		tgt->list[i].key = n;
+		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].always_copy_from
+		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 		gomp_map_vars_existing (devicep, n, k, kind & typemask);
 	      }
 	    else
 	      {
 		size_t align = (size_t) 1 << (kind >> rshift);
-		tgt->list[i] = k;
+		tgt->list[i].key = k;
 		tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		k->tgt = tgt;
 		k->tgt_offset = tgt_size;
 		tgt_size += k->host_end - k->host_start;
-		k->copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		tgt->list[i].always_copy_from
+		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 		k->refcount = 1;
 		k->async_refcount = 0;
 		tgt->refcount++;
@@ -395,6 +409,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		  case GOMP_MAP_TOFROM:
 		  case GOMP_MAP_FORCE_TO:
 		  case GOMP_MAP_FORCE_TOFROM:
+		  case GOMP_MAP_ALWAYS_TO:
+		  case GOMP_MAP_ALWAYS_TOFROM:
 		    /* FIXME: Perhaps add some smarts, like if copying
 		       several adjacent fields from host to target, use some
 		       host buffer to avoid sending each var individually.  */
@@ -427,7 +443,9 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 			break;
 		      else
 			{
-			  tgt->list[j] = k;
+			  tgt->list[j].key = k;
+			  tgt->list[j].copy_from = false;
+			  tgt->list[j].always_copy_from = false;
 			  k->refcount++;
 			  gomp_map_pointer (tgt,
 					    (uintptr_t) *(void **) hostaddrs[j],
@@ -479,11 +497,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
     {
       for (i = 0; i < mapnum; i++)
 	{
-	  if (tgt->list[i] == NULL)
+	  if (tgt->list[i].key == NULL)
 	    cur_node.tgt_offset = (uintptr_t) NULL;
 	  else
-	    cur_node.tgt_offset = tgt->list[i]->tgt->tgt_start
-				  + tgt->list[i]->tgt_offset;
+	    cur_node.tgt_offset = tgt->list[i].key->tgt->tgt_start
+				  + tgt->list[i].key->tgt_offset;
 	  /* FIXME: see above FIXME comment.  */
 	  devicep->host2dev_func (devicep->target_id,
 				  (void *) (tgt->tgt_start
@@ -523,17 +541,17 @@ gomp_copy_from_async (struct target_mem_desc *tgt)
   gomp_mutex_lock (&devicep->lock);
 
   for (i = 0; i < tgt->list_count; i++)
-    if (tgt->list[i] == NULL)
+    if (tgt->list[i].key == NULL)
       ;
-    else if (tgt->list[i]->refcount > 1)
+    else if (tgt->list[i].key->refcount > 1)
       {
-	tgt->list[i]->refcount--;
-	tgt->list[i]->async_refcount++;
+	tgt->list[i].key->refcount--;
+	tgt->list[i].key->async_refcount++;
       }
     else
       {
-	splay_tree_key k = tgt->list[i];
-	if (k->copy_from)
+	splay_tree_key k = tgt->list[i].key;
+	if (tgt->list[i].copy_from)
 	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
 				  (void *) (k->tgt->tgt_start + k->tgt_offset),
 				  k->host_end - k->host_start);
@@ -561,25 +579,33 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
-    if (tgt->list[i] == NULL)
-      ;
-    else if (tgt->list[i]->refcount > 1)
-      tgt->list[i]->refcount--;
-    else if (tgt->list[i]->async_refcount > 0)
-      tgt->list[i]->async_refcount--;
-    else
-      {
-	splay_tree_key k = tgt->list[i];
-	if (k->copy_from && do_copyfrom)
-	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
-				  (void *) (k->tgt->tgt_start + k->tgt_offset),
-				  k->host_end - k->host_start);
-	splay_tree_remove (&devicep->mem_map, k);
-	if (k->tgt->refcount > 1)
-	  k->tgt->refcount--;
-	else
-	  gomp_unmap_tgt (k->tgt);
-      }
+    {
+      splay_tree_key k = tgt->list[i].key;
+      if (k == NULL)
+	continue;
+
+      bool do_unmap = false;
+      if (k->refcount > 1)
+	k->refcount--;
+      else if (k->async_refcount > 0)
+	k->async_refcount--;
+      else
+	do_unmap = true;
+
+      if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
+	  || tgt->list[i].always_copy_from)
+	devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
+				(void *) (k->tgt->tgt_start + k->tgt_offset),
+				k->host_end - k->host_start);
+      if (do_unmap)
+	{
+	  splay_tree_remove (&devicep->mem_map, k);
+	  if (k->tgt->refcount > 1)
+	    k->tgt->refcount--;
+	  else
+	    gomp_unmap_tgt (k->tgt);
+	}
+    }
 
   if (tgt->refcount > 1)
     tgt->refcount--;
@@ -714,8 +740,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       k->tgt_offset = target_table[i].start;
       k->refcount = 1;
       k->async_refcount = 0;
-      k->copy_from = false;
-      tgt->list[i] = k;
+      tgt->list[i].key = k;
       tgt->refcount++;
       array->left = NULL;
       array->right = NULL;
@@ -742,8 +767,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       k->tgt_offset = target_var->start;
       k->refcount = 1;
       k->async_refcount = 0;
-      k->copy_from = false;
-      tgt->list[i] = k;
+      tgt->list[i].key = k;
       tgt->refcount++;
       array->left = NULL;
       array->right = NULL;
diff --git libgomp/testsuite/libgomp.c/target-11.c libgomp/testsuite/libgomp.c/target-11.c
new file mode 100644
index 0000000..4562d88
--- /dev/null
+++ libgomp/testsuite/libgomp.c/target-11.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target offload_device } */
+
+#include <assert.h>
+
+int main ()
+{
+  int aa = 0, bb = 0, cc = 0, dd = 0;
+
+  #pragma omp target data map(tofrom: aa) map(to: bb) map(from: cc, dd)
+    {
+      int ok;
+      aa = bb = cc = 1;
+
+      /* Set dd on target to 0 for the further check.  */
+      #pragma omp target map(always to: dd)
+	{ dd; }
+
+      dd = 1;
+      #pragma omp target map(tofrom: aa) map(always to: bb) \
+	map(always from: cc) map(to: dd) map(from: ok)
+	{
+	  /* bb is always to, aa and dd are not.  */
+	  ok = (aa == 0) && (bb == 1) && (dd == 0);
+	  aa = bb = cc = dd = 2;
+	}
+
+      assert (ok);
+      assert (aa == 1);
+      assert (bb == 1);
+      assert (cc == 2); /* cc is always from.  */
+      assert (dd == 1);
+
+      dd = 3;
+      #pragma omp target map(from: cc) map(always to: dd) map(from: ok)
+	{
+	  ok = (dd == 3); /* dd is always to.  */
+	  cc = dd = 4;
+	}
+
+      assert (ok);
+      assert (cc == 2);
+      assert (dd == 3);
+    }
+
+  assert (aa == 2);
+  assert (bb == 1);
+  assert (cc == 4);
+  assert (dd == 4);
+
+  return 0;
+}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)
  2015-10-19 16:33                 ` OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data) Thomas Schwinge
@ 2015-10-19 16:48                   ` Ilya Verbin
  2015-10-20  7:37                     ` Jakub Jelinek
  2015-10-22 18:46                   ` [gomp4] " Thomas Schwinge
  2015-11-24 10:32                   ` [PATCH, libgomp] Rewire OpenACC async Chung-Lin Tang
  2 siblings, 1 reply; 49+ messages in thread
From: Ilya Verbin @ 2015-10-19 16:48 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Chung-Lin Tang, gcc-patches, Kirill Yukhin, Jakub Jelinek

On Mon, Oct 19, 2015 at 18:24:35 +0200, Thomas Schwinge wrote:
> Chung-Lin, would you please have a look at the following (on
> gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?
> 
>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
>     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
>     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

Maybe it was caused by this change in gomp_unmap_vars?
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01376.html

Looking at the code, I don't see any difference in async_refcount handling, but
I was unable to test it without having hardware :(

  -- Ilya

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)
  2015-10-19 16:48                   ` Ilya Verbin
@ 2015-10-20  7:37                     ` Jakub Jelinek
  0 siblings, 0 replies; 49+ messages in thread
From: Jakub Jelinek @ 2015-10-20  7:37 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, Chung-Lin Tang, gcc-patches, Kirill Yukhin

On Mon, Oct 19, 2015 at 07:43:59PM +0300, Ilya Verbin wrote:
> On Mon, Oct 19, 2015 at 18:24:35 +0200, Thomas Schwinge wrote:
> > Chung-Lin, would you please have a look at the following (on
> > gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?
> > 
> >     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> >     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> >     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> >     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> 
> Maybe it was caused by this change in gomp_unmap_vars?
> https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01376.html
> 
> Looking at the code, I don't see any difference in async_refcount handling, but
> I was unable to test it without having hardware :(

I think that is the only patch that could have affected it.
The copy_from change is from the old behavior, where basically all
concurrent mappings ored into the copy_from flag and when refcount went to
0, if there were any mappings with from or tofrom, it copied back,
the OpenMP 4.5 behavior is that whether data is copied from the device
is determined solely by the mapping kind of the mapping that performs the
refcount decrease to 0.  Plus there is the always flag which requests
the data copying operation always, no matter what the refcount is (either on
the mapping/refcount increase side, or unmapping/refcount decrease size).

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [gomp4] OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)
  2015-10-19 16:33                 ` OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data) Thomas Schwinge
  2015-10-19 16:48                   ` Ilya Verbin
@ 2015-10-22 18:46                   ` Thomas Schwinge
  2015-11-18 15:18                     ` [gomp4] OpenACC async clause regressions Tom de Vries
  2015-11-24 10:32                   ` [PATCH, libgomp] Rewire OpenACC async Chung-Lin Tang
  2 siblings, 1 reply; 49+ messages in thread
From: Thomas Schwinge @ 2015-10-22 18:46 UTC (permalink / raw)
  To: Chung-Lin Tang, gcc-patches; +Cc: Kirill Yukhin, Jakub Jelinek, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 6745 bytes --]

Hi!

On Mon, 19 Oct 2015 18:24:35 +0200, I wrote:
> Chung-Lin, would you please have a look at the following (on
> gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?

Ilya, Jakub, thanks for your comments!

> On Tue, 23 Jun 2015 13:51:39 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote:
> > > On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
> > > > Given that a mapped variable in 4.1 can have different kinds across nested data
> > > > regions, we need to store map-type not only for each var, but also for each
> > > > structured mapping.  Here is my WIP patch, is it sane? :)
> > > > Attached testcase works OK on the device with non-shared memory.
> > > 
> > > A bit updated version with a fix for GOMP_MAP_TO_PSET.
> > > make check-target-libgomp passed.
> > 
> > Ok, thanks.
> > 
> > > include/gcc/
> > > 	* gomp-constants.h (GOMP_MAP_ALWAYS_TO_P,
> > > 	GOMP_MAP_ALWAYS_FROM_P): Define.
> > > libgomp/
> > > 	* libgomp.h (struct target_var_desc): New.
> > > 	(struct target_mem_desc): Replace array of splay_tree_key with array of
> > > 	target_var_desc.
> > > 	(struct splay_tree_key_s): Move copy_from to target_var_desc.
> > > 	* oacc-mem.c (gomp_acc_remove_pointer): Use copy_from from
> > > 	target_var_desc.
> > > 	* oacc-parallel.c (GOACC_parallel): Use copy_from from target_var_desc.
> > > 	* target.c (gomp_map_vars_existing): Copy data to device if map-type is
> > > 	'always to' or 'always tofrom'.
> > > 	(gomp_map_vars): Use key from target_var_desc.  Set copy_from and
> > > 	always_copy_from.
> > > 	(gomp_copy_from_async): Use key and copy_from from target_var_desc.
> > > 	(gomp_unmap_vars): Copy data from device if always_copy_from is set.
> > > 	(gomp_offload_image_to_device): Do not use copy_from.
> > > 	* testsuite/libgomp.c/target-11.c: New test.
> 
> (That's gomp-4_1-branch r224838.  The attached
> gomp-4_1-branch-r224838.patch is a variant that applies on top of
> gomp-4_0-branch r228972.)  This change introduces regressions in OpenACC
> async clause handling.

> Testing on gomp-4_0-branch r228972 plus the attached
> gomp-4_1-branch-r224838.patch:
> 
>     PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none (test for excess errors)
>     [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/asyncwait-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none execution test
> 
> Same for C++.

With an XFAIL added (Chung-Lin, please remove that one once you come up
with a fix), and merge conflicts resolved as follows, I have now merged
gomp-4_1-branch r224838 in gomp-4_0-branch r229178:

commit cbef8ef8e3b6bf7ea3705b1fae5462be9e619a56
Merge: 3596aeb a568354
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Oct 22 17:50:08 2015 +0000

    svn merge -r 224607:224838 svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_1-branch
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229178 138bc75d-0d04-0410-961f-82ee72b054a4

 include/ChangeLog.gomp41                           |   5 +
 include/gomp-constants.h                           |   6 ++
 libgomp/ChangeLog.gomp41                           |  18 ++++
 libgomp/libgomp.h                                  |  15 ++-
 libgomp/oacc-mem.c                                 |   2 +-
 libgomp/oacc-parallel.c                            |   6 +-
 libgomp/target.c                                   | 106 +++++++++++++--------
 libgomp/testsuite/libgomp.c/target-11.c            |  51 ++++++++++
 .../libgomp.oacc-c-c++-common/asyncwait-1.c        |   2 +
 9 files changed, 162 insertions(+), 49 deletions(-)

diff --cc libgomp/oacc-mem.c
index 7fcf199,c0fcb07..a90c912
--- libgomp/oacc-mem.c
+++ libgomp/oacc-mem.c
@@@ -685,7 -650,8 +685,7 @@@ gomp_acc_remove_pointer (void *h, bool 
  	}
      }
  
-   t->list[0]->copy_from = force_copyfrom ? 1 : 0;
 -  if (force_copyfrom)
 -    t->list[0].copy_from = 1;
++  t->list[0].copy_from = force_copyfrom ? 1 : 0;
  
    gomp_mutex_unlock (&acc_dev->lock);
  
diff --cc libgomp/oacc-parallel.c
index 2b90c9f,8ea3dd1..e4ecc87
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@@ -261,16 -135,12 +261,16 @@@ GOACC_parallel_keyed (int device, void 
  
    devaddrs = gomp_alloca (sizeof (void *) * mapnum);
    for (i = 0; i < mapnum; i++)
 -    devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
 -			    + tgt->list[i].key->tgt_offset);
 +    {
-       if (tgt->list[i] != NULL)
- 	devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
- 				+ tgt->list[i]->tgt_offset);
++      if (tgt->list[i].key != NULL)
++	devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
++				+ tgt->list[i].key->tgt_offset);
 +      else
 +	devaddrs[i] = NULL;
 +    }
  
 -  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
 -			      num_gangs, num_workers, vector_length, async,
 -			      tgt);
 +  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
 +			      async, dims, tgt);
  
    /* If running synchronously, unmap immediately.  */
    if (async < acc_async_noval)
diff --cc libgomp/target.c
index 4587361,05c9b71..c2e1996
--- libgomp/target.c
+++ libgomp/target.c
@@@ -714,9 -721,6 +740,8 @@@ gomp_load_image_to_device (struct gomp_
        k->tgt_offset = target_table[i].start;
        k->refcount = 1;
        k->async_refcount = 0;
-       k->copy_from = false;
-       tgt->list[i] = k;
++      tgt->list[i].key = k;
 +      tgt->refcount++;
        array->left = NULL;
        array->right = NULL;
        splay_tree_insert (&devicep->mem_map, array);
@@@ -742,9 -746,6 +767,8 @@@
        k->tgt_offset = target_var->start;
        k->refcount = 1;
        k->async_refcount = 0;
-       k->copy_from = false;
-       tgt->list[i] = k;
++      tgt->list[i].key = k;
 +      tgt->refcount++;
        array->left = NULL;
        array->right = NULL;
        splay_tree_insert (&devicep->mem_map, array);
diff --cc libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
index d478ce2,22cef6d..f3b490a
--- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
@@@ -1,4 -1,4 +1,6 @@@
  /* { dg-do run { target openacc_nvidia_accel_selected } } */
++/* <http://news.gmane.org/find-root.php?message_id=%3C87pp0aaksc.fsf%40kepler.schwinge.homeip.net%3E>.
++   { dg-xfail-run-if "TODO" { *-*-* } } */
  /* { dg-additional-options "-lcuda" } */
  
  #include <openacc.h>


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4] OpenACC async clause regressions
  2015-10-22 18:46                   ` [gomp4] " Thomas Schwinge
@ 2015-11-18 15:18                     ` Tom de Vries
  2016-03-30 16:32                       ` Thomas Schwinge
  0 siblings, 1 reply; 49+ messages in thread
From: Tom de Vries @ 2015-11-18 15:18 UTC (permalink / raw)
  To: Thomas Schwinge, Chung-Lin Tang, gcc-patches
  Cc: Kirill Yukhin, Jakub Jelinek, Ilya Verbin

On 22/10/15 20:27, Thomas Schwinge wrote:
> diff --cc libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> index d478ce2,22cef6d..f3b490a
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> @@@ -1,4 -1,4 +1,6 @@@
>    /* { dg-do run { target openacc_nvidia_accel_selected } } */
> ++/*<http://news.gmane.org/find-root.php?message_id=%3C87pp0aaksc.fsf%40kepler.schwinge.homeip.net%3E>.
> ++   { dg-xfail-run-if "TODO" { *-*-* } } */
>    /* { dg-additional-options "-lcuda" } */
>
>    #include <openacc.h>

This failure shows up on trunk. Should it also be xfailed there?

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [gomp4] Adjust Fortran OACC async lib test
@ 2015-11-23 11:13                         ` Chung-Lin Tang
  2015-12-02  9:21                           ` Chung-Lin Tang
  2015-12-08 11:46                           ` Thomas Schwinge
  0 siblings, 2 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2015-11-23 11:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

Hi Thomas,
this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90.

For lib-12.f90, it's sort of a fix before we can resolve the issue
of intended semantics for "wait+async".

As for lib-13.f90, I believe these added acc_wait calls seem
reasonable, since we can't immediately assume the async-launched parallels
already completed there.

Does this seem reasonable?

Thanks,
Chung-Lin

	* testsuite/libgomp.oacc-fortran/lib-12.f90 (main): Add acc_wait()
	after async parallel construct.
	* testsuite/libgomp.oacc-fortran/lib-13.f90 (main): Add acc_wait()
	calls after parallel construct launches.

[-- Attachment #2: f.diff --]
[-- Type: text/plain, Size: 811 bytes --]

Index: libgomp.oacc-fortran/lib-12.f90
===================================================================
--- libgomp.oacc-fortran/lib-12.f90	(revision 230719)
+++ libgomp.oacc-fortran/lib-12.f90	(working copy)
@@ -15,6 +15,8 @@ program main
     end do
   !$acc end parallel
 
+  call acc_wait (0)
+
   call acc_wait_async (0, 1)
 
   if (acc_async_test (0) .neqv. .TRUE.) call abort
Index: libgomp.oacc-fortran/lib-13.f90
===================================================================
--- libgomp.oacc-fortran/lib-13.f90	(revision 230719)
+++ libgomp.oacc-fortran/lib-13.f90	(working copy)
@@ -21,6 +21,9 @@ program main
     end do
   !$acc end data
 
+  call acc_wait (1)
+  call acc_wait (2)
+
   if (acc_async_test (1) .neqv. .TRUE.) call abort
   if (acc_async_test (2) .neqv. .TRUE.) call abort
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
@ 2015-11-23 13:21                     ` Chung-Lin Tang
       [not found]                       ` <56628C72.9040802@codesourcery.com>
                                         ` (3 more replies)
  0 siblings, 4 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2015-11-23 13:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jason Merrill, Joseph S. Myers, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

The OpenACC wait directive is represented as a call to the runtime
function "GOACC_wait" instead of a tree code.  I am seeing when
'#pragma acc wait' is using inside a template function, the CALL_EXPR
to GOACC_wait is being silently ignored/removed during tsubst_expr().

I think the correct way to organize this is that the call should be inside
an EXPR_STMT, so here's a patch to do that; basically remove the
add_stmt() call from the shared c_finish_oacc_wait() code, and add
add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.

Tested with no regressions on trunk, okay to commit?

Thanks,
Chung-Lin

	* c-family/c-omp.c (c_finish_oacc_wait): Remove add_stmt() call.
	* c/c-parser.c (c_parser_oacc_wait): Add add_stmt() call.
	* cp/parser.c (cp_parser_oacc_wait): Add finish_expr_stmt() call.

[-- Attachment #2: t.diff --]
[-- Type: text/plain, Size: 1148 bytes --]

Index: c-family/c-omp.c
===================================================================
--- c-family/c-omp.c	(revision 230703)
+++ c-family/c-omp.c	(working copy)
@@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
     }
 
   stmt = build_call_expr_loc_vec (loc, stmt, args);
-  add_stmt (stmt);
 
   vec_free (args);
 
Index: c/c-parser.c
===================================================================
--- c/c-parser.c	(revision 230703)
+++ c/c-parser.c	(working copy)
@@ -13886,6 +13886,7 @@ c_parser_oacc_wait (location_t loc, c_parser *pars
   strcpy (p_name, " wait");
   clauses = c_parser_oacc_all_clauses (parser, OACC_WAIT_CLAUSE_MASK, p_name);
   stmt = c_finish_oacc_wait (loc, list, clauses);
+  add_stmt (stmt);
 
   return stmt;
 }
Index: cp/parser.c
===================================================================
--- cp/parser.c	(revision 230703)
+++ cp/parser.c	(working copy)
@@ -34930,6 +34930,7 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *
 					"#pragma acc wait", pragma_tok);
 
   stmt = c_finish_oacc_wait (loc, list, clauses);
+  stmt = finish_expr_stmt (stmt);
 
   return stmt;
 }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH, libgomp] Rewire OpenACC async
@ 2015-11-24 10:32                   ` Chung-Lin Tang
  2015-11-23 13:21                     ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
                                       ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2015-11-24 10:32 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Thomas Schwinge, Cesar Philippidis, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 2667 bytes --]

Hi, this patch reworks some of the way that asynchronous copyouts are
implemented for OpenACC in libgomp.

Before this patch, we had a somewhat confusing way of implementing this
by having two refcounts for each mapping: refcount and async_refcount,
which I never got working again after the last wave of async regressions
showed up.

So this patch implements what I believe to be a simplification: async_refcount
is removed, and instead of trying to queue the async copyouts during unmapping
we actually do that during the plugin event handling. This requires a addition
of the async stream integer as an argument to the register_async_cleanup
plugin hook, but overall I think this should be more elegant than before.

This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
It also fixed data-[23].c regressions before, but some other recent check-in
happened to already fixed those.

Tested without regressions, is this okay for trunk?

Thanks,
Chung-Lin

2015-11-24  Chung-Lin Tang  <cltang@codesourcery.com>

        * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
        * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
        parameter, use to set async stream around call to gomp_unmap_vars,
        call gomp_unmap_vars() with 'do_copyfrom' set to true.
        * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
        (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
        events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
        (event_add): Add int parameter, initialize 'val' field when
        adding new ptx_event struct.
        (nvptx_evec): Adjust event_add() call arguments.
        (nvptx_host2dev): Likewise.
        (nvptx_dev2host): Likewise.
        (nvptx_wait_async): Likewise.
        (nvptx_wait_all_async): Likewise.
        (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
        pass to event_add() call.
        * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
        parameter.
        * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
        call openacc.register_async_cleanup_func() hook.
        * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
        * target.c (gomp_copy_from_async): Delete function.
        (gomp_map_vars): Remove async_refcount.
        (gomp_unmap_vars): Likewise.
        (gomp_load_image_to_device): Likewise.
        (omp_target_associate_ptr): Likewise.
        * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
        (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
        (gomp_copy_from_async): Remove.

[-- Attachment #2: x.diff --]
[-- Type: text/plain, Size: 11048 bytes --]

Index: plugin/plugin-nvptx.c
===================================================================
--- plugin/plugin-nvptx.c	(revision 230796)
+++ plugin/plugin-nvptx.c	(working copy)
@@ -310,6 +310,7 @@ struct ptx_event
   int type;
   void *addr;
   int ord;
+  int val;
 
   struct ptx_event *next;
 };
@@ -786,6 +787,7 @@ static void
 event_gc (bool memmap_lockable)
 {
   struct ptx_event *ptx_event = ptx_events;
+  struct ptx_event *async_cleanups = NULL;
   struct nvptx_thread *nvthd = nvptx_thread ();
 
   pthread_mutex_lock (&ptx_event_lock);
@@ -803,6 +805,7 @@ event_gc (bool memmap_lockable)
       r = cuEventQuery (*e->evt);
       if (r == CUDA_SUCCESS)
 	{
+	  bool append_async = false;
 	  CUevent *te;
 
 	  te = e->evt;
@@ -827,7 +830,7 @@ event_gc (bool memmap_lockable)
 		if (!memmap_lockable)
 		  continue;
 
-		GOMP_PLUGIN_async_unmap_vars (e->addr);
+		append_async = true;
 	      }
 	      break;
 	    }
@@ -835,6 +838,7 @@ event_gc (bool memmap_lockable)
 	  cuEventDestroy (*te);
 	  free ((void *)te);
 
+	  /* Unlink 'e' from ptx_events list.  */
 	  if (ptx_events == e)
 	    ptx_events = ptx_events->next;
 	  else
@@ -845,15 +849,31 @@ event_gc (bool memmap_lockable)
 	      e_->next = e_->next->next;
 	    }
 
-	  free (e);
+	  if (append_async)
+	    {
+	      e->next = async_cleanups;
+	      async_cleanups = e;
+	    }
+	  else
+	    free (e);
 	}
     }
 
   pthread_mutex_unlock (&ptx_event_lock);
+
+  /* We have to do these here, after ptx_event_lock is released.  */
+  while (async_cleanups)
+    {
+      struct ptx_event *e = async_cleanups;
+      async_cleanups = async_cleanups->next;
+
+      GOMP_PLUGIN_async_unmap_vars (e->addr, e->val);
+      free (e);
+    }
 }
 
 static void
-event_add (enum ptx_event_type type, CUevent *e, void *h)
+event_add (enum ptx_event_type type, CUevent *e, void *h, int val)
 {
   struct ptx_event *ptx_event;
   struct nvptx_thread *nvthd = nvptx_thread ();
@@ -866,6 +886,7 @@ static void
   ptx_event->evt = e;
   ptx_event->addr = h;
   ptx_event->ord = nvthd->ptx_dev->ord;
+  ptx_event->val = val;
 
   pthread_mutex_lock (&ptx_event_lock);
 
@@ -966,7 +987,7 @@ nvptx_exec (void (*fn), size_t mapnum, void **host
       if (r != CUDA_SUCCESS)
         GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-      event_add (PTX_EVT_KNL, e, (void *)dev_str);
+      event_add (PTX_EVT_KNL, e, (void *)dev_str, 0);
     }
 #else
   r = cuCtxSynchronize ();
@@ -1073,7 +1094,7 @@ nvptx_host2dev (void *d, const void *h, size_t s)
       if (r != CUDA_SUCCESS)
         GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-      event_add (PTX_EVT_MEM, e, (void *)h);
+      event_add (PTX_EVT_MEM, e, (void *)h, 0);
     }
   else
 #endif
@@ -1138,7 +1159,7 @@ nvptx_dev2host (void *h, const void *d, size_t s)
       if (r != CUDA_SUCCESS)
         GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-      event_add (PTX_EVT_MEM, e, (void *)h);
+      event_add (PTX_EVT_MEM, e, (void *)h, 0);
     }
   else
 #endif
@@ -1264,7 +1285,7 @@ nvptx_wait_async (int async1, int async2)
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-  event_add (PTX_EVT_SYNC, e, NULL);
+  event_add (PTX_EVT_SYNC, e, NULL, 0);
 
   r = cuStreamWaitEvent (s2->stream, *e, 0);
   if (r != CUDA_SUCCESS)
@@ -1346,7 +1367,7 @@ nvptx_wait_all_async (int async)
       if (r != CUDA_SUCCESS)
 	GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-      event_add (PTX_EVT_SYNC, e, NULL);
+      event_add (PTX_EVT_SYNC, e, NULL, 0);
 
       r = cuStreamWaitEvent (waiting_stream->stream, *e, 0);
       if (r != CUDA_SUCCESS)
@@ -1658,7 +1679,7 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *)
 }
 
 void
-GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
+GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc, int async)
 {
   CUevent *e;
   CUresult r;
@@ -1674,7 +1695,7 @@ void
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-  event_add (PTX_EVT_ASYNC_CLEANUP, e, targ_mem_desc);
+  event_add (PTX_EVT_ASYNC_CLEANUP, e, targ_mem_desc, async);
 }
 
 int
Index: oacc-mem.c
===================================================================
--- oacc-mem.c	(revision 230796)
+++ oacc-mem.c	(working copy)
@@ -659,10 +659,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyf
   if (async < acc_async_noval)
     gomp_unmap_vars (t, true);
   else
-    {
-      gomp_copy_from_async (t);
-      acc_dev->openacc.register_async_cleanup_func (t);
-    }
+    t->device_descr->openacc.register_async_cleanup_func (t, async);
 
   gomp_debug (0, "  %s: mappings restored\n", __FUNCTION__);
 }
Index: libgomp.h
===================================================================
--- libgomp.h	(revision 230796)
+++ libgomp.h	(working copy)
@@ -829,8 +829,6 @@ struct splay_tree_key_s {
   uintptr_t tgt_offset;
   /* Reference count.  */
   uintptr_t refcount;
-  /* Asynchronous reference count.  */
-  uintptr_t async_refcount;
 };
 
 /* The comparison function.  */
@@ -864,7 +862,7 @@ typedef struct acc_dispatch_t
 		     unsigned *, void *);
 
   /* Async cleanup callback registration.  */
-  void (*register_async_cleanup_func) (void *);
+  void (*register_async_cleanup_func) (void *, int);
 
   /* Asynchronous routines.  */
   int (*async_test_func) (int);
@@ -958,7 +956,6 @@ extern struct target_mem_desc *gomp_map_vars (stru
 					      size_t, void **, void **,
 					      size_t *, void *, bool,
 					      enum gomp_map_vars_kind);
-extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
Index: oacc-plugin.c
===================================================================
--- oacc-plugin.c	(revision 230796)
+++ oacc-plugin.c	(working copy)
@@ -31,11 +31,14 @@
 #include "oacc-int.h"
 
 void
-GOMP_PLUGIN_async_unmap_vars (void *ptr)
+GOMP_PLUGIN_async_unmap_vars (void *ptr, int async)
 {
   struct target_mem_desc *tgt = ptr;
+  struct gomp_device_descr *devicep = tgt->device_descr;
 
-  gomp_unmap_vars (tgt, false);
+  devicep->openacc.async_set_async_func (async);
+  gomp_unmap_vars (tgt, true);
+  devicep->openacc.async_set_async_func (acc_async_sync);
 }
 
 /* Return the target-specific part of the TLS data for the current thread.  */
Index: oacc-plugin.h
===================================================================
--- oacc-plugin.h	(revision 230796)
+++ oacc-plugin.h	(working copy)
@@ -27,7 +27,7 @@
 #ifndef OACC_PLUGIN_H
 #define OACC_PLUGIN_H 1
 
-extern void GOMP_PLUGIN_async_unmap_vars (void *);
+extern void GOMP_PLUGIN_async_unmap_vars (void *, int);
 extern void *GOMP_PLUGIN_acc_thread (void);
 
 #endif
Index: oacc-host.c
===================================================================
--- oacc-host.c	(revision 230796)
+++ oacc-host.c	(working copy)
@@ -143,7 +143,8 @@ host_openacc_exec (void (*fn) (void *),
 }
 
 static void
-host_openacc_register_async_cleanup (void *targ_mem_desc __attribute__ ((unused)))
+host_openacc_register_async_cleanup (void *targ_mem_desc __attribute__ ((unused)),
+				     int async __attribute__ ((unused)))
 {
 }
 
Index: target.c
===================================================================
--- target.c	(revision 230796)
+++ target.c	(working copy)
@@ -644,7 +644,6 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 		tgt->list[i].offset = 0;
 		tgt->list[i].length = k->host_end - k->host_start;
 		k->refcount = 1;
-		k->async_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
@@ -784,40 +783,6 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
   free (tgt);
 }
 
-/* Decrease the refcount for a set of mapped variables, and queue asychronous
-   copies from the device back to the host after any work that has been issued.
-   Because the regions are still "live", increment an asynchronous reference
-   count to indicate that they should not be unmapped from host-side data
-   structures until the asynchronous copy has completed.  */
-
-attribute_hidden void
-gomp_copy_from_async (struct target_mem_desc *tgt)
-{
-  struct gomp_device_descr *devicep = tgt->device_descr;
-  size_t i;
-
-  gomp_mutex_lock (&devicep->lock);
-
-  for (i = 0; i < tgt->list_count; i++)
-    if (tgt->list[i].key == NULL)
-      ;
-    else if (tgt->list[i].key->refcount > 1)
-      {
-	tgt->list[i].key->refcount--;
-	tgt->list[i].key->async_refcount++;
-      }
-    else
-      {
-	splay_tree_key k = tgt->list[i].key;
-	if (tgt->list[i].copy_from)
-	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
-				  (void *) (k->tgt->tgt_start + k->tgt_offset),
-				  k->host_end - k->host_start);
-      }
-
-  gomp_mutex_unlock (&devicep->lock);
-}
-
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
    variables back from device to host: if it is false, it is assumed that this
    has been done already, i.e. by gomp_copy_from_async above.  */
@@ -847,13 +812,8 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool
 	k->refcount--;
       else if (k->refcount == 1)
 	{
-	  if (k->async_refcount > 0)
-	    k->async_refcount--;
-	  else
-	    {
-	      k->refcount--;
-	      do_unmap = true;
-	    }
+	  k->refcount--;
+	  do_unmap = true;
 	}
 
       if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
@@ -995,7 +955,6 @@ gomp_load_image_to_device (struct gomp_device_desc
       k->tgt = tgt;
       k->tgt_offset = target_table[i].start;
       k->refcount = REFCOUNT_INFINITY;
-      k->async_refcount = 0;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
@@ -1020,7 +979,6 @@ gomp_load_image_to_device (struct gomp_device_desc
       k->tgt = tgt;
       k->tgt_offset = target_var->start;
       k->refcount = REFCOUNT_INFINITY;
-      k->async_refcount = 0;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
@@ -2120,7 +2078,6 @@ omp_target_associate_ptr (void *host_ptr, void *de
       k->tgt = tgt;
       k->tgt_offset = (uintptr_t) device_ptr + device_offset;
       k->refcount = REFCOUNT_INFINITY;
-      k->async_refcount = 0;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
Index: oacc-parallel.c
===================================================================
--- oacc-parallel.c	(revision 230796)
+++ oacc-parallel.c	(working copy)
@@ -182,10 +182,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void
   if (async < acc_async_noval)
     gomp_unmap_vars (tgt, true);
   else
-    {
-      gomp_copy_from_async (tgt);
-      acc_dev->openacc.register_async_cleanup_func (tgt);
-    }
+    tgt->device_descr->openacc.register_async_cleanup_func (tgt, async);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2015-11-24 10:32                   ` [PATCH, libgomp] Rewire OpenACC async Chung-Lin Tang
  2015-11-23 13:21                     ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
@ 2015-12-01 12:01                     ` Julian Brown
  2015-12-05  9:23                       ` Chung-Lin Tang
  2015-12-22  8:59                     ` Chung-Lin Tang
  2 siblings, 1 reply; 49+ messages in thread
From: Julian Brown @ 2015-12-01 12:01 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: gcc-patches, Jakub Jelinek, Thomas Schwinge, Cesar Philippidis

On Tue, 24 Nov 2015 18:27:24 +0800
Chung-Lin Tang <cltang@codesourcery.com> wrote:

> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
> 
> Before this patch, we had a somewhat confusing way of implementing
> this by having two refcounts for each mapping: refcount and
> async_refcount, which I never got working again after the last wave
> of async regressions showed up.
> 
> So this patch implements what I believe to be a simplification:
> async_refcount is removed, and instead of trying to queue the async
> copyouts during unmapping we actually do that during the plugin event
> handling. This requires a addition of the async stream integer as an
> argument to the register_async_cleanup plugin hook, but overall I
> think this should be more elegant than before.

This looks OK to me I think (I've only looked fairly briefly). I vaguely
remember trying something along these lines in an earlier iteration of
the async support -- maybe hitting problems with locking (I see you
have code to mitigate problems with that, and locking generally has
probably evolved a bit since I last looked at the code in detail
anyway).

Can event_gc ever be called when the *device* lock is held?

I'm slightly concerned that pushing async unmapping into event_gc means
that program-level semantics are deferred to the backend, which is
arguably the wrong place. But then I don't understand what went wrong
with the dual-refcount implementation, so maybe it's unavoidable for
some reason.

HTH,

Julian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4] Adjust Fortran OACC async lib test
  2015-11-23 11:13                         ` [gomp4] Adjust Fortran OACC async lib test Chung-Lin Tang
@ 2015-12-02  9:21                           ` Chung-Lin Tang
  2015-12-08 11:46                           ` Thomas Schwinge
  1 sibling, 0 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2015-12-02  9:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: Thomas Schwinge

Ping.

Hi Thomas, this is only for gomp4 ATM, okay to commit?

Thanks,
Chung-Lin

On 2015/11/23 7:09 PM, Chung-Lin Tang wrote:
> Hi Thomas,
> this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90.
> 
> For lib-12.f90, it's sort of a fix before we can resolve the issue
> of intended semantics for "wait+async".
> 
> As for lib-13.f90, I believe these added acc_wait calls seem
> reasonable, since we can't immediately assume the async-launched parallels
> already completed there.
> 
> Does this seem reasonable?
> 
> Thanks,
> Chung-Lin
> 
> 	* testsuite/libgomp.oacc-fortran/lib-12.f90 (main): Add acc_wait()
> 	after async parallel construct.
> 	* testsuite/libgomp.oacc-fortran/lib-13.f90 (main): Add acc_wait()
> 	calls after parallel construct launches.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
  2015-11-23 13:21                     ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
       [not found]                       ` <56628C72.9040802@codesourcery.com>
@ 2015-12-02  9:22                       ` Chung-Lin Tang
  2015-12-03  8:51                       ` Thomas Schwinge
  2015-12-07  5:00                       ` Jason Merrill
  3 siblings, 0 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2015-12-02  9:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jason Merrill, Joseph S. Myers, Thomas Schwinge

Ping.

On 2015/11/23 9:15 PM, Chung-Lin Tang wrote:
> The OpenACC wait directive is represented as a call to the runtime
> function "GOACC_wait" instead of a tree code.  I am seeing when
> '#pragma acc wait' is using inside a template function, the CALL_EXPR
> to GOACC_wait is being silently ignored/removed during tsubst_expr().
> 
> I think the correct way to organize this is that the call should be inside
> an EXPR_STMT, so here's a patch to do that; basically remove the
> add_stmt() call from the shared c_finish_oacc_wait() code, and add
> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
> 
> Tested with no regressions on trunk, okay to commit?
> 
> Thanks,
> Chung-Lin
> 
> 	* c-family/c-omp.c (c_finish_oacc_wait): Remove add_stmt() call.
> 	* c/c-parser.c (c_parser_oacc_wait): Add add_stmt() call.
> 	* cp/parser.c (cp_parser_oacc_wait): Add finish_expr_stmt() call.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
  2015-11-23 13:21                     ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
       [not found]                       ` <56628C72.9040802@codesourcery.com>
  2015-12-02  9:22                       ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
@ 2015-12-03  8:51                       ` Thomas Schwinge
  2015-12-03  8:59                         ` Thomas Schwinge
  2015-12-07  5:00                       ` Jason Merrill
  3 siblings, 1 reply; 49+ messages in thread
From: Thomas Schwinge @ 2015-12-03  8:51 UTC (permalink / raw)
  To: Chung-Lin Tang, Jakub Jelinek; +Cc: Jason Merrill, Joseph S. Myers, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]

Hi Chung-Lin!

On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang <cltang@codesourcery.com> wrote:
> The OpenACC wait directive is represented as a call to the runtime
> function "GOACC_wait" instead of a tree code.  I am seeing when
> '#pragma acc wait' is using inside a template function, the CALL_EXPR
> to GOACC_wait is being silently ignored/removed during tsubst_expr().

Uh.

> I think the correct way to organize this is that the call should be inside
> an EXPR_STMT, so here's a patch to do that; basically remove the
> add_stmt() call from the shared c_finish_oacc_wait() code, and add
> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
> 
> Tested with no regressions on trunk, okay to commit?

> --- c-family/c-omp.c	(revision 230703)
> +++ c-family/c-omp.c	(working copy)
> @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
>      }
>  
>    stmt = build_call_expr_loc_vec (loc, stmt, args);
> -  add_stmt (stmt);
>  
>    vec_free (args);
|  
|    return stmt;
|  }

I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that
build builtin calls instead of tree nodes, do similar things like
c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not
a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait,
c_finish_omp_taskyield, c_finish_omp_flush?  (Jakub?)

> --- c/c-parser.c	(revision 230703)
> +++ c/c-parser.c	(working copy)
> @@ -13886,6 +13886,7 @@ c_parser_oacc_wait (location_t loc, c_parser *pars
>    strcpy (p_name, " wait");
>    clauses = c_parser_oacc_all_clauses (parser, OACC_WAIT_CLAUSE_MASK, p_name);
>    stmt = c_finish_oacc_wait (loc, list, clauses);
> +  add_stmt (stmt);
>  
>    return stmt;
>  }
> --- cp/parser.c	(revision 230703)
> +++ cp/parser.c	(working copy)
> @@ -34930,6 +34930,7 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *
>  					"#pragma acc wait", pragma_tok);
>  
>    stmt = c_finish_oacc_wait (loc, list, clauses);
> +  stmt = finish_expr_stmt (stmt);
>  
>    return stmt;
>  }


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
  2015-12-03  8:51                       ` Thomas Schwinge
@ 2015-12-03  8:59                         ` Thomas Schwinge
  2015-12-03 10:05                           ` Chung-Lin Tang
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Schwinge @ 2015-12-03  8:59 UTC (permalink / raw)
  To: Chung-Lin Tang, Jakub Jelinek; +Cc: Jason Merrill, Joseph S. Myers, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2799 bytes --]

Hi!

On Thu, 03 Dec 2015 09:51:31 +0100, I wrote:
> On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang <cltang@codesourcery.com> wrote:
> > The OpenACC wait directive is represented as a call to the runtime
> > function "GOACC_wait" instead of a tree code.  I am seeing when
> > '#pragma acc wait' is using inside a template function, the CALL_EXPR
> > to GOACC_wait is being silently ignored/removed during tsubst_expr().
> 
> Uh.
> 
> > I think the correct way to organize this is that the call should be inside
> > an EXPR_STMT, so here's a patch to do that; basically remove the
> > add_stmt() call from the shared c_finish_oacc_wait() code, and add
> > add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
> > 
> > Tested with no regressions on trunk, okay to commit?
> 
> > --- c-family/c-omp.c	(revision 230703)
> > +++ c-family/c-omp.c	(working copy)
> > @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
> >      }
> >  
> >    stmt = build_call_expr_loc_vec (loc, stmt, args);
> > -  add_stmt (stmt);
> >  
> >    vec_free (args);
> |  
> |    return stmt;
> |  }
> 
> I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that
> build builtin calls instead of tree nodes, do similar things like
> c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not
> a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait,
> c_finish_omp_taskyield, c_finish_omp_flush?  (Jakub?)

Oh wait, it looks like the C++ front end is not actually using the
functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
addition to finish_expr_stmt calls, I see it's also using
finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
So I guess we'll want to model this the same way for OpenACC support
functions, and then (later) we should clean this up, to move the
C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)

> > --- c/c-parser.c	(revision 230703)
> > +++ c/c-parser.c	(working copy)
> > @@ -13886,6 +13886,7 @@ c_parser_oacc_wait (location_t loc, c_parser *pars
> >    strcpy (p_name, " wait");
> >    clauses = c_parser_oacc_all_clauses (parser, OACC_WAIT_CLAUSE_MASK, p_name);
> >    stmt = c_finish_oacc_wait (loc, list, clauses);
> > +  add_stmt (stmt);
> >  
> >    return stmt;
> >  }
> > --- cp/parser.c	(revision 230703)
> > +++ cp/parser.c	(working copy)
> > @@ -34930,6 +34930,7 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *
> >  					"#pragma acc wait", pragma_tok);
> >  
> >    stmt = c_finish_oacc_wait (loc, list, clauses);
> > +  stmt = finish_expr_stmt (stmt);
> >  
> >    return stmt;
> >  }


Grüße
 Thomas


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
  2015-12-03  8:59                         ` Thomas Schwinge
@ 2015-12-03 10:05                           ` Chung-Lin Tang
  2015-12-03 10:11                             ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Chung-Lin Tang @ 2015-12-03 10:05 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek
  Cc: Jason Merrill, Joseph S. Myers, gcc-patches

On 2015/12/3 4:59 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 03 Dec 2015 09:51:31 +0100, I wrote:
>> On Mon, 23 Nov 2015 21:15:00 +0800, Chung-Lin Tang <cltang@codesourcery.com> wrote:
>>> The OpenACC wait directive is represented as a call to the runtime
>>> function "GOACC_wait" instead of a tree code.  I am seeing when
>>> '#pragma acc wait' is using inside a template function, the CALL_EXPR
>>> to GOACC_wait is being silently ignored/removed during tsubst_expr().
>>
>> Uh.
>>
>>> I think the correct way to organize this is that the call should be inside
>>> an EXPR_STMT, so here's a patch to do that; basically remove the
>>> add_stmt() call from the shared c_finish_oacc_wait() code, and add
>>> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
>>>
>>> Tested with no regressions on trunk, okay to commit?
>>
>>> --- c-family/c-omp.c	(revision 230703)
>>> +++ c-family/c-omp.c	(working copy)
>>> @@ -63,7 +63,6 @@ c_finish_oacc_wait (location_t loc, tree parms, tr
>>>      }
>>>  
>>>    stmt = build_call_expr_loc_vec (loc, stmt, args);
>>> -  add_stmt (stmt);
>>>  
>>>    vec_free (args);
>> |  
>> |    return stmt;
>> |  }
>>
>> I see in gcc/c/c-omp.c that several other c_finish_omp_* functions that
>> build builtin calls instead of tree nodes, do similar things like
>> c_finish_oacc_wait; I'd like to understand why it's -- presumably -- not
>> a problem for these: c_finish_omp_barrier, c_finish_omp_taskwait,
>> c_finish_omp_taskyield, c_finish_omp_flush?  (Jakub?)
> 
> Oh wait, it looks like the C++ front end is not actually using the
> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
> addition to finish_expr_stmt calls, I see it's also using
> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
> So I guess we'll want to model this the same way for OpenACC support
> functions, and then (later) we should clean this up, to move the
> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)

I see most OpenACC/OpenMP constructs are represented by special statement codes,
so they should be a different case. I so far only see the OpenACC wait directive
being represented as a CALL_EXPR (maybe there are others, haven't exhaustively searched).

Chung-Lin


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
  2015-12-03 10:05                           ` Chung-Lin Tang
@ 2015-12-03 10:11                             ` Jakub Jelinek
  2015-12-03 10:33                               ` Chung-Lin Tang
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2015-12-03 10:11 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: Thomas Schwinge, Jason Merrill, Joseph S. Myers, gcc-patches

On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote:
> > Oh wait, it looks like the C++ front end is not actually using the
> > functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
> > own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
> > addition to finish_expr_stmt calls, I see it's also using
> > finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
> > So I guess we'll want to model this the same way for OpenACC support
> > functions, and then (later) we should clean this up, to move the
> > C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)
> 
> I see most OpenACC/OpenMP constructs are represented by special statement codes,
> so they should be a different case. I so far only see the OpenACC wait directive
> being represented as a CALL_EXPR (maybe there are others, haven't exhaustively searched).

No, Thomas is right, just look at
finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point},
all those are represented as CALL_EXPRs.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
  2015-12-03 10:11                             ` Jakub Jelinek
@ 2015-12-03 10:33                               ` Chung-Lin Tang
  2015-12-05  9:57                                 ` Chung-Lin Tang
  0 siblings, 1 reply; 49+ messages in thread
From: Chung-Lin Tang @ 2015-12-03 10:33 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, Jason Merrill, Joseph S. Myers, gcc-patches

On 2015/12/3 6:11 PM, Jakub Jelinek wrote:
> On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote:
>>> Oh wait, it looks like the C++ front end is not actually using the
>>> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
>>> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
>>> addition to finish_expr_stmt calls, I see it's also using
>>> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
>>> So I guess we'll want to model this the same way for OpenACC support
>>> functions, and then (later) we should clean this up, to move the
>>> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)
>>
>> I see most OpenACC/OpenMP constructs are represented by special statement codes,
>> so they should be a different case. I so far only see the OpenACC wait directive
>> being represented as a CALL_EXPR (maybe there are others, haven't exhaustively searched).
> 
> No, Thomas is right, just look at
> finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point},
> all those are represented as CALL_EXPRs.
> 
> 	Jakub
> 

Okay, I guess my impression was only for some OpenACC constructs.

Overall, OpenACC wait seems one of the few cases of using c_finish_* in cp/parser.c.
Whether other cases should move towards/away from that kind of style is a larger question,
I was only trying to fix a libgomp.oacc-c++/template-reduction.C regression (testcase currently still in gomp4 branch)

Chung-Lin

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2015-12-01 12:01                     ` [PATCH, libgomp] Rewire OpenACC async Julian Brown
@ 2015-12-05  9:23                       ` Chung-Lin Tang
  0 siblings, 0 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2015-12-05  9:23 UTC (permalink / raw)
  To: Julian Brown
  Cc: gcc-patches, Jakub Jelinek, Thomas Schwinge, Cesar Philippidis

On 2015/12/1 08:01 PM, Julian Brown wrote:
> On Tue, 24 Nov 2015 18:27:24 +0800
> Chung-Lin Tang <cltang@codesourcery.com> wrote:
> 
>> Hi, this patch reworks some of the way that asynchronous copyouts are
>> implemented for OpenACC in libgomp.
>>
>> Before this patch, we had a somewhat confusing way of implementing
>> this by having two refcounts for each mapping: refcount and
>> async_refcount, which I never got working again after the last wave
>> of async regressions showed up.
>>
>> So this patch implements what I believe to be a simplification:
>> async_refcount is removed, and instead of trying to queue the async
>> copyouts during unmapping we actually do that during the plugin event
>> handling. This requires a addition of the async stream integer as an
>> argument to the register_async_cleanup plugin hook, but overall I
>> think this should be more elegant than before.
> 
> This looks OK to me I think (I've only looked fairly briefly). I vaguely
> remember trying something along these lines in an earlier iteration of
> the async support -- maybe hitting problems with locking (I see you
> have code to mitigate problems with that, and locking generally has
> probably evolved a bit since I last looked at the code in detail
> anyway).
> 
> Can event_gc ever be called when the *device* lock is held?

It only matters when the memmap_lockable argument is true, and for those
cases, no the device lock is never held.

> I'm slightly concerned that pushing async unmapping into event_gc means
> that program-level semantics are deferred to the backend, which is
> arguably the wrong place. But then I don't understand what went wrong
> with the dual-refcount implementation, so maybe it's unavoidable for
> some reason.

I got the dual-refcounting to work again (after the regressions first showed up)
in some cases briefly, but regressed in other testcases, which I don't recall
the full details now.
Indeed the copyout is now triggered inside the plugin, but it is still wrapped
inside GOMP_PLUGIN_async_unmap_vars(), so it's probably not too ugly.

Per our earlier internal discussion, I'm committing this to the gomp4 branch first.
Trunk will need to wait for Jakub's approval.

Thanks,
Chung-Lin

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
  2015-12-03 10:33                               ` Chung-Lin Tang
@ 2015-12-05  9:57                                 ` Chung-Lin Tang
  0 siblings, 0 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2015-12-05  9:57 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, Jason Merrill, Joseph S. Myers, gcc-patches

On 2015/12/3 06:32 PM, Chung-Lin Tang wrote:
> On 2015/12/3 6:11 PM, Jakub Jelinek wrote:
>> On Thu, Dec 03, 2015 at 06:05:36PM +0800, Chung-Lin Tang wrote:
>>>> Oh wait, it looks like the C++ front end is not actually using the
>>>> functions defined in the C/C++-shared gcc/c-family/c-omp.c, but has its
>>>> own implementations in gcc/cp/semantics.c, without "c_" prefixes?  In
>>>> addition to finish_expr_stmt calls, I see it's also using
>>>> finish_call_expr instead of build_call_expr_loc/build_call_expr_loc_vec.
>>>> So I guess we'll want to model this the same way for OpenACC support
>>>> functions, and then (later) we should clean this up, to move the
>>>> C-specific code from gcc/c-family/c-omp.c into the C front end?  (Jakub?)
>>>
>>> I see most OpenACC/OpenMP constructs are represented by special statement codes,
>>> so they should be a different case. I so far only see the OpenACC wait directive
>>> being represented as a CALL_EXPR (maybe there are others, haven't exhaustively searched).
>>
>> No, Thomas is right, just look at
>> finish_omp_{barrier,flush,taskwait,taskyield,cancel,cancellation_point},
>> all those are represented as CALL_EXPRs.
>>
>> 	Jakub
>>
> 
> Okay, I guess my impression was only for some OpenACC constructs.
> 
> Overall, OpenACC wait seems one of the few cases of using c_finish_* in cp/parser.c.
> Whether other cases should move towards/away from that kind of style is a larger question,
> I was only trying to fix a libgomp.oacc-c++/template-reduction.C regression (testcase currently still in gomp4 branch)
> 
> Chung-Lin
> 

Per our internal discussion, I will commit this patch first to the gomp4 branch,
while awaiting trunk approval.

Thanks,
Chung-Lin

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT
  2015-11-23 13:21                     ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
                                         ` (2 preceding siblings ...)
  2015-12-03  8:51                       ` Thomas Schwinge
@ 2015-12-07  5:00                       ` Jason Merrill
  3 siblings, 0 replies; 49+ messages in thread
From: Jason Merrill @ 2015-12-07  5:00 UTC (permalink / raw)
  To: Chung-Lin Tang, gcc-patches; +Cc: Joseph S. Myers, Thomas Schwinge

OK.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4] Adjust Fortran OACC async lib test
  2015-11-23 11:13                         ` [gomp4] Adjust Fortran OACC async lib test Chung-Lin Tang
  2015-12-02  9:21                           ` Chung-Lin Tang
@ 2015-12-08 11:46                           ` Thomas Schwinge
  1 sibling, 0 replies; 49+ messages in thread
From: Thomas Schwinge @ 2015-12-08 11:46 UTC (permalink / raw)
  To: Chung-Lin Tang, gcc-patches, James Norris

Hi!

On Mon, 23 Nov 2015 19:09:36 +0800, Chung-Lin Tang <cltang@codesourcery.com> wrote:
> this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90.

It has not been obvious to me that these test cases would regress (PASS
-> FAIL, at least for some optimization levels) when your recent "[PATCH,
libgomp] Rewire OpenACC async" is applied.  Likewise for the testcase
affected ("repaired") by "[PATCH, C++] Wrap OpenACC wait in EXPR_STMT",
libgomp.oacc-c++/template-reduction.C.  Aside from verbally noting such
dependencies between patches and test cases, that latter patch submission
(for trunk) could have included (something like) this gomp-4_0-branch
libgomp.oacc-c++/template-reduction.C test case, to motivate the code
change, for example.


> For lib-12.f90, it's sort of a fix before we can resolve the issue
> of intended semantics for "wait+async".
> 
> As for lib-13.f90, I believe these added acc_wait calls seem
> reasonable, since we can't immediately assume the async-launched parallels
> already completed there.
> 
> Does this seem reasonable?

I think (and Jim, original author of these tests, copied to correct me if
I'm wrong) the intention of these tests is to launch a kernel
asynchronously, running long enough so that in the following we can test
that it's still running, enqueue asynchronous waits, and so on
(acc_sync_test, acc_wait_async, and so on).  Adding acc_wait calls
renders any such testing void?

However, isn't currently the logic written the wrong way round?  That is,
currently we abort if there are still asynchronous operations running,
which would explain why your change to add acc_wait calls does fix these
tests...

(And, of course a simple "do i = 1, 1000000: j = j + 1" loop is not
really a bullet-proof way to achieve such a long-running kernel...)

Given these problems, I suggest indeed you do commit your patch, and I'll
make a note that these test cases need to be revisited.

When committing this, please also remove the XFAIL directives from
libgomp.oacc-c-c++-common/asyncwait-1.c, which you forgot to do in your
gomp-4_0-branch "[PATCH, libgomp] Rewire OpenACC async" commit.

> Index: libgomp.oacc-fortran/lib-12.f90
> ===================================================================
> --- libgomp.oacc-fortran/lib-12.f90	(revision 230719)
> +++ libgomp.oacc-fortran/lib-12.f90	(working copy)
> @@ -15,6 +15,8 @@ program main
>      end do
>    !$acc end parallel
>  
> +  call acc_wait (0)
> +
>    call acc_wait_async (0, 1)
>  
>    if (acc_async_test (0) .neqv. .TRUE.) call abort
> Index: libgomp.oacc-fortran/lib-13.f90
> ===================================================================
> --- libgomp.oacc-fortran/lib-13.f90	(revision 230719)
> +++ libgomp.oacc-fortran/lib-13.f90	(working copy)
> @@ -21,6 +21,9 @@ program main
>      end do
>    !$acc end data
>  
> +  call acc_wait (1)
> +  call acc_wait (2)
> +
>    if (acc_async_test (1) .neqv. .TRUE.) call abort
>    if (acc_async_test (2) .neqv. .TRUE.) call abort


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2015-11-24 10:32                   ` [PATCH, libgomp] Rewire OpenACC async Chung-Lin Tang
  2015-11-23 13:21                     ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
  2015-12-01 12:01                     ` [PATCH, libgomp] Rewire OpenACC async Julian Brown
@ 2015-12-22  8:59                     ` Chung-Lin Tang
  2016-03-29 10:15                       ` Chung-Lin Tang
  2 siblings, 1 reply; 49+ messages in thread
From: Chung-Lin Tang @ 2015-12-22  8:59 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Thomas Schwinge, Cesar Philippidis, Julian Brown

Ping.

On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
> 
> Before this patch, we had a somewhat confusing way of implementing this
> by having two refcounts for each mapping: refcount and async_refcount,
> which I never got working again after the last wave of async regressions
> showed up.
> 
> So this patch implements what I believe to be a simplification: async_refcount
> is removed, and instead of trying to queue the async copyouts during unmapping
> we actually do that during the plugin event handling. This requires a addition
> of the async stream integer as an argument to the register_async_cleanup
> plugin hook, but overall I think this should be more elegant than before.
> 
> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
> It also fixed data-[23].c regressions before, but some other recent check-in
> happened to already fixed those.
> 
> Tested without regressions, is this okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2015-11-24  Chung-Lin Tang  <cltang@codesourcery.com>
> 
>         * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
>         * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
>         parameter, use to set async stream around call to gomp_unmap_vars,
>         call gomp_unmap_vars() with 'do_copyfrom' set to true.
>         * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
>         (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
>         events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
>         (event_add): Add int parameter, initialize 'val' field when
>         adding new ptx_event struct.
>         (nvptx_evec): Adjust event_add() call arguments.
>         (nvptx_host2dev): Likewise.
>         (nvptx_dev2host): Likewise.
>         (nvptx_wait_async): Likewise.
>         (nvptx_wait_all_async): Likewise.
>         (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
>         pass to event_add() call.
>         * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
>         parameter.
>         * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
>         call openacc.register_async_cleanup_func() hook.
>         * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
>         * target.c (gomp_copy_from_async): Delete function.
>         (gomp_map_vars): Remove async_refcount.
>         (gomp_unmap_vars): Likewise.
>         (gomp_load_image_to_device): Likewise.
>         (omp_target_associate_ptr): Likewise.
>         * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
>         (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
>         (gomp_copy_from_async): Remove.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2015-12-22  8:59                     ` Chung-Lin Tang
@ 2016-03-29 10:15                       ` Chung-Lin Tang
  2016-04-08 11:03                         ` Chung-Lin Tang
  2016-05-12 10:03                         ` [PATCH, libgomp] Rewire OpenACC async Jakub Jelinek
  0 siblings, 2 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2016-03-29 10:15 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Thomas Schwinge, Cesar Philippidis, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 3185 bytes --]

I've updated this patch for trunk (as attached), and re-tested without
regressions. This patch is still a fix for libgomp.oacc-c-c++-common/asyncwait-1.c,
which FAILs right now.

ChangeLog is still as before. Is this okay for trunk?

Thanks,
Chung-Lin

On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
>> Hi, this patch reworks some of the way that asynchronous copyouts are
>> implemented for OpenACC in libgomp.
>>
>> Before this patch, we had a somewhat confusing way of implementing this
>> by having two refcounts for each mapping: refcount and async_refcount,
>> which I never got working again after the last wave of async regressions
>> showed up.
>>
>> So this patch implements what I believe to be a simplification: async_refcount
>> is removed, and instead of trying to queue the async copyouts during unmapping
>> we actually do that during the plugin event handling. This requires a addition
>> of the async stream integer as an argument to the register_async_cleanup
>> plugin hook, but overall I think this should be more elegant than before.
>>
>> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
>> It also fixed data-[23].c regressions before, but some other recent check-in
>> happened to already fixed those.
>>
>> Tested without regressions, is this okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> 2015-11-24  Chung-Lin Tang  <cltang@codesourcery.com>
>>
>>         * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
>>         * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
>>         parameter, use to set async stream around call to gomp_unmap_vars,
>>         call gomp_unmap_vars() with 'do_copyfrom' set to true.
>>         * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
>>         (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
>>         events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
>>         (event_add): Add int parameter, initialize 'val' field when
>>         adding new ptx_event struct.
>>         (nvptx_evec): Adjust event_add() call arguments.
>>         (nvptx_host2dev): Likewise.
>>         (nvptx_dev2host): Likewise.
>>         (nvptx_wait_async): Likewise.
>>         (nvptx_wait_all_async): Likewise.
>>         (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
>>         pass to event_add() call.
>>         * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
>>         parameter.
>>         * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
>>         call openacc.register_async_cleanup_func() hook.
>>         * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
>>         * target.c (gomp_copy_from_async): Delete function.
>>         (gomp_map_vars): Remove async_refcount.
>>         (gomp_unmap_vars): Likewise.
>>         (gomp_load_image_to_device): Likewise.
>>         (omp_target_associate_ptr): Likewise.
>>         * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
>>         (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
>>         (gomp_copy_from_async): Remove.
>>
> 


[-- Attachment #2: openacc-async-rewire-20160329.patch --]
[-- Type: text/x-patch, Size: 11112 bytes --]

Index: oacc-host.c
===================================================================
--- oacc-host.c	(revision 234516)
+++ oacc-host.c	(working copy)
@@ -144,7 +144,8 @@ host_openacc_exec (void (*fn) (void *),
 }
 
 static void
-host_openacc_register_async_cleanup (void *targ_mem_desc __attribute__ ((unused)))
+host_openacc_register_async_cleanup (void *targ_mem_desc __attribute__ ((unused)),
+				     int async __attribute__ ((unused)))
 {
 }
 
Index: oacc-mem.c
===================================================================
--- oacc-mem.c	(revision 234516)
+++ oacc-mem.c	(working copy)
@@ -661,10 +661,7 @@ gomp_acc_remove_pointer (void *h, bool force_copyf
   if (async < acc_async_noval)
     gomp_unmap_vars (t, true);
   else
-    {
-      gomp_copy_from_async (t);
-      acc_dev->openacc.register_async_cleanup_func (t);
-    }
+    t->device_descr->openacc.register_async_cleanup_func (t, async);
 
   gomp_debug (0, "  %s: mappings restored\n", __FUNCTION__);
 }
Index: oacc-parallel.c
===================================================================
--- oacc-parallel.c	(revision 234516)
+++ oacc-parallel.c	(working copy)
@@ -186,10 +186,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void
   if (async < acc_async_noval)
     gomp_unmap_vars (tgt, true);
   else
-    {
-      gomp_copy_from_async (tgt);
-      acc_dev->openacc.register_async_cleanup_func (tgt);
-    }
+    tgt->device_descr->openacc.register_async_cleanup_func (tgt, async);
 
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
Index: target.c
===================================================================
--- target.c	(revision 234516)
+++ target.c	(working copy)
@@ -663,7 +663,6 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 		tgt->list[i].offset = 0;
 		tgt->list[i].length = k->host_end - k->host_start;
 		k->refcount = 1;
-		k->async_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
@@ -813,40 +812,6 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
   free (tgt);
 }
 
-/* Decrease the refcount for a set of mapped variables, and queue asychronous
-   copies from the device back to the host after any work that has been issued.
-   Because the regions are still "live", increment an asynchronous reference
-   count to indicate that they should not be unmapped from host-side data
-   structures until the asynchronous copy has completed.  */
-
-attribute_hidden void
-gomp_copy_from_async (struct target_mem_desc *tgt)
-{
-  struct gomp_device_descr *devicep = tgt->device_descr;
-  size_t i;
-
-  gomp_mutex_lock (&devicep->lock);
-
-  for (i = 0; i < tgt->list_count; i++)
-    if (tgt->list[i].key == NULL)
-      ;
-    else if (tgt->list[i].key->refcount > 1)
-      {
-	tgt->list[i].key->refcount--;
-	tgt->list[i].key->async_refcount++;
-      }
-    else
-      {
-	splay_tree_key k = tgt->list[i].key;
-	if (tgt->list[i].copy_from)
-	  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
-				  (void *) (k->tgt->tgt_start + k->tgt_offset),
-				  k->host_end - k->host_start);
-      }
-
-  gomp_mutex_unlock (&devicep->lock);
-}
-
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
    variables back from device to host: if it is false, it is assumed that this
    has been done already, i.e. by gomp_copy_from_async above.  */
@@ -883,13 +848,8 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool
 	k->refcount--;
       else if (k->refcount == 1)
 	{
-	  if (k->async_refcount > 0)
-	    k->async_refcount--;
-	  else
-	    {
-	      k->refcount--;
-	      do_unmap = true;
-	    }
+	  k->refcount--;
+	  do_unmap = true;
 	}
 
       if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
@@ -1040,7 +1000,6 @@ gomp_load_image_to_device (struct gomp_device_desc
       k->tgt = tgt;
       k->tgt_offset = target_table[i].start;
       k->refcount = REFCOUNT_INFINITY;
-      k->async_refcount = 0;
       k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
@@ -1073,7 +1032,6 @@ gomp_load_image_to_device (struct gomp_device_desc
       k->tgt = tgt;
       k->tgt_offset = target_var->start;
       k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY;
-      k->async_refcount = 0;
       k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
@@ -2299,7 +2257,6 @@ omp_target_associate_ptr (void *host_ptr, void *de
       k->tgt = tgt;
       k->tgt_offset = (uintptr_t) device_ptr + device_offset;
       k->refcount = REFCOUNT_INFINITY;
-      k->async_refcount = 0;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
Index: libgomp.h
===================================================================
--- libgomp.h	(revision 234516)
+++ libgomp.h	(working copy)
@@ -837,8 +837,6 @@ struct splay_tree_key_s {
   uintptr_t tgt_offset;
   /* Reference count.  */
   uintptr_t refcount;
-  /* Asynchronous reference count.  */
-  uintptr_t async_refcount;
   /* Pointer to the original mapping of "omp declare target link" object.  */
   splay_tree_key link_key;
 };
@@ -874,7 +872,7 @@ typedef struct acc_dispatch_t
 		     unsigned *, void *);
 
   /* Async cleanup callback registration.  */
-  void (*register_async_cleanup_func) (void *);
+  void (*register_async_cleanup_func) (void *, int);
 
   /* Asynchronous routines.  */
   int (*async_test_func) (int);
@@ -979,7 +977,6 @@ extern struct target_mem_desc *gomp_map_vars (stru
 					      size_t, void **, void **,
 					      size_t *, void *, bool,
 					      enum gomp_map_vars_kind);
-extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
Index: oacc-plugin.c
===================================================================
--- oacc-plugin.c	(revision 234516)
+++ oacc-plugin.c	(working copy)
@@ -31,11 +31,14 @@
 #include "oacc-int.h"
 
 void
-GOMP_PLUGIN_async_unmap_vars (void *ptr)
+GOMP_PLUGIN_async_unmap_vars (void *ptr, int async)
 {
   struct target_mem_desc *tgt = ptr;
+  struct gomp_device_descr *devicep = tgt->device_descr;
 
-  gomp_unmap_vars (tgt, false);
+  devicep->openacc.async_set_async_func (async);
+  gomp_unmap_vars (tgt, true);
+  devicep->openacc.async_set_async_func (acc_async_sync);
 }
 
 /* Return the target-specific part of the TLS data for the current thread.  */
Index: oacc-plugin.h
===================================================================
--- oacc-plugin.h	(revision 234516)
+++ oacc-plugin.h	(working copy)
@@ -27,7 +27,7 @@
 #ifndef OACC_PLUGIN_H
 #define OACC_PLUGIN_H 1
 
-extern void GOMP_PLUGIN_async_unmap_vars (void *);
+extern void GOMP_PLUGIN_async_unmap_vars (void *, int);
 extern void *GOMP_PLUGIN_acc_thread (void);
 
 #endif
Index: plugin/plugin-nvptx.c
===================================================================
--- plugin/plugin-nvptx.c	(revision 234516)
+++ plugin/plugin-nvptx.c	(working copy)
@@ -310,6 +310,7 @@ struct ptx_event
   int type;
   void *addr;
   int ord;
+  int val;
 
   struct ptx_event *next;
 };
@@ -783,6 +784,7 @@ static void
 event_gc (bool memmap_lockable)
 {
   struct ptx_event *ptx_event = ptx_events;
+  struct ptx_event *async_cleanups = NULL;
   struct nvptx_thread *nvthd = nvptx_thread ();
 
   pthread_mutex_lock (&ptx_event_lock);
@@ -800,6 +802,7 @@ event_gc (bool memmap_lockable)
       r = cuEventQuery (*e->evt);
       if (r == CUDA_SUCCESS)
 	{
+	  bool append_async = false;
 	  CUevent *te;
 
 	  te = e->evt;
@@ -824,7 +827,7 @@ event_gc (bool memmap_lockable)
 		if (!memmap_lockable)
 		  continue;
 
-		GOMP_PLUGIN_async_unmap_vars (e->addr);
+		append_async = true;
 	      }
 	      break;
 	    }
@@ -832,6 +835,7 @@ event_gc (bool memmap_lockable)
 	  cuEventDestroy (*te);
 	  free ((void *)te);
 
+	  /* Unlink 'e' from ptx_events list.  */
 	  if (ptx_events == e)
 	    ptx_events = ptx_events->next;
 	  else
@@ -842,15 +846,31 @@ event_gc (bool memmap_lockable)
 	      e_->next = e_->next->next;
 	    }
 
-	  free (e);
+	  if (append_async)
+	    {
+	      e->next = async_cleanups;
+	      async_cleanups = e;
+	    }
+	  else
+	    free (e);
 	}
     }
 
   pthread_mutex_unlock (&ptx_event_lock);
+
+  /* We have to do these here, after ptx_event_lock is released.  */
+  while (async_cleanups)
+    {
+      struct ptx_event *e = async_cleanups;
+      async_cleanups = async_cleanups->next;
+
+      GOMP_PLUGIN_async_unmap_vars (e->addr, e->val);
+      free (e);
+    }
 }
 
 static void
-event_add (enum ptx_event_type type, CUevent *e, void *h)
+event_add (enum ptx_event_type type, CUevent *e, void *h, int val)
 {
   struct ptx_event *ptx_event;
   struct nvptx_thread *nvthd = nvptx_thread ();
@@ -863,6 +883,7 @@ static void
   ptx_event->evt = e;
   ptx_event->addr = h;
   ptx_event->ord = nvthd->ptx_dev->ord;
+  ptx_event->val = val;
 
   pthread_mutex_lock (&ptx_event_lock);
 
@@ -975,7 +996,7 @@ nvptx_exec (void (*fn), size_t mapnum, void **host
       if (r != CUDA_SUCCESS)
         GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-      event_add (PTX_EVT_KNL, e, (void *)dev_str);
+      event_add (PTX_EVT_KNL, e, (void *)dev_str, 0);
     }
 #else
   r = cuCtxSynchronize ();
@@ -1082,7 +1103,7 @@ nvptx_host2dev (void *d, const void *h, size_t s)
       if (r != CUDA_SUCCESS)
         GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-      event_add (PTX_EVT_MEM, e, (void *)h);
+      event_add (PTX_EVT_MEM, e, (void *)h, 0);
     }
   else
 #endif
@@ -1147,7 +1168,7 @@ nvptx_dev2host (void *h, const void *d, size_t s)
       if (r != CUDA_SUCCESS)
         GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-      event_add (PTX_EVT_MEM, e, (void *)h);
+      event_add (PTX_EVT_MEM, e, (void *)h, 0);
     }
   else
 #endif
@@ -1273,7 +1294,7 @@ nvptx_wait_async (int async1, int async2)
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-  event_add (PTX_EVT_SYNC, e, NULL);
+  event_add (PTX_EVT_SYNC, e, NULL, 0);
 
   r = cuStreamWaitEvent (s2->stream, *e, 0);
   if (r != CUDA_SUCCESS)
@@ -1355,7 +1376,7 @@ nvptx_wait_all_async (int async)
       if (r != CUDA_SUCCESS)
 	GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-      event_add (PTX_EVT_SYNC, e, NULL);
+      event_add (PTX_EVT_SYNC, e, NULL, 0);
 
       r = cuStreamWaitEvent (waiting_stream->stream, *e, 0);
       if (r != CUDA_SUCCESS)
@@ -1667,7 +1688,7 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *)
 }
 
 void
-GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc)
+GOMP_OFFLOAD_openacc_register_async_cleanup (void *targ_mem_desc, int async)
 {
   CUevent *e;
   CUresult r;
@@ -1683,7 +1704,7 @@ void
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuEventRecord error: %s", cuda_error (r));
 
-  event_add (PTX_EVT_ASYNC_CLEANUP, e, targ_mem_desc);
+  event_add (PTX_EVT_ASYNC_CLEANUP, e, targ_mem_desc, async);
 }
 
 int

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [gomp4] OpenACC async clause regressions
  2015-11-18 15:18                     ` [gomp4] OpenACC async clause regressions Tom de Vries
@ 2016-03-30 16:32                       ` Thomas Schwinge
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Schwinge @ 2016-03-30 16:32 UTC (permalink / raw)
  To: Tom de Vries, Chung-Lin Tang; +Cc: gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]

Hi!

On Wed, 18 Nov 2015 16:17:39 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 22/10/15 20:27, Thomas Schwinge wrote:
> > diff --cc libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> > index d478ce2,22cef6d..f3b490a
> > --- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> > @@@ -1,4 -1,4 +1,6 @@@
> >    /* { dg-do run { target openacc_nvidia_accel_selected } } */
> > ++/*<http://news.gmane.org/find-root.php?message_id=%3C87pp0aaksc.fsf%40kepler.schwinge.homeip.net%3E>.
> > ++   { dg-xfail-run-if "TODO" { *-*-* } } */
> >    /* { dg-additional-options "-lcuda" } */
> >
> >    #include <openacc.h>
> 
> This failure shows up on trunk. Should it also be xfailed there?

I added the XFAIL as part of my recent r234575 "Update OpenACC test
cases" commit,
<http://news.gmane.org/find-root.php?message_id=%3C878u109ew4.fsf%40hertz.schwinge.homeip.net%3E>.
Chung-Lin, for avoidance of doubt, please remove that XFAIL once you get
to commit your fix for this issue (currently waiting for Jakub's
approval).


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-03-29 10:15                       ` Chung-Lin Tang
@ 2016-04-08 11:03                         ` Chung-Lin Tang
  2016-04-16  7:41                           ` Chung-Lin Tang
  2016-05-12 10:03                         ` [PATCH, libgomp] Rewire OpenACC async Jakub Jelinek
  1 sibling, 1 reply; 49+ messages in thread
From: Chung-Lin Tang @ 2016-04-08 11:03 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Thomas Schwinge, Cesar Philippidis, Julian Brown

Ping.

On 2016/3/29 5:48 PM, Chung-Lin Tang wrote:
> I've updated this patch for trunk (as attached), and re-tested without
> regressions. This patch is still a fix for libgomp.oacc-c-c++-common/asyncwait-1.c,
> which FAILs right now.
> 
> ChangeLog is still as before. Is this okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
>> Ping.
>>
>> On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
>>> Hi, this patch reworks some of the way that asynchronous copyouts are
>>> implemented for OpenACC in libgomp.
>>>
>>> Before this patch, we had a somewhat confusing way of implementing this
>>> by having two refcounts for each mapping: refcount and async_refcount,
>>> which I never got working again after the last wave of async regressions
>>> showed up.
>>>
>>> So this patch implements what I believe to be a simplification: async_refcount
>>> is removed, and instead of trying to queue the async copyouts during unmapping
>>> we actually do that during the plugin event handling. This requires a addition
>>> of the async stream integer as an argument to the register_async_cleanup
>>> plugin hook, but overall I think this should be more elegant than before.
>>>
>>> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
>>> It also fixed data-[23].c regressions before, but some other recent check-in
>>> happened to already fixed those.
>>>
>>> Tested without regressions, is this okay for trunk?
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> 2015-11-24  Chung-Lin Tang  <cltang@codesourcery.com>
>>>
>>>         * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
>>>         * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
>>>         parameter, use to set async stream around call to gomp_unmap_vars,
>>>         call gomp_unmap_vars() with 'do_copyfrom' set to true.
>>>         * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
>>>         (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
>>>         events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
>>>         (event_add): Add int parameter, initialize 'val' field when
>>>         adding new ptx_event struct.
>>>         (nvptx_evec): Adjust event_add() call arguments.
>>>         (nvptx_host2dev): Likewise.
>>>         (nvptx_dev2host): Likewise.
>>>         (nvptx_wait_async): Likewise.
>>>         (nvptx_wait_all_async): Likewise.
>>>         (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
>>>         pass to event_add() call.
>>>         * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
>>>         parameter.
>>>         * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
>>>         call openacc.register_async_cleanup_func() hook.
>>>         * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
>>>         * target.c (gomp_copy_from_async): Delete function.
>>>         (gomp_map_vars): Remove async_refcount.
>>>         (gomp_unmap_vars): Likewise.
>>>         (gomp_load_image_to_device): Likewise.
>>>         (omp_target_associate_ptr): Likewise.
>>>         * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
>>>         (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
>>>         (gomp_copy_from_async): Remove.
>>>
>>
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-04-08 11:03                         ` Chung-Lin Tang
@ 2016-04-16  7:41                           ` Chung-Lin Tang
  2016-05-11  6:47                             ` [PATCH, libgomp] Rewire OpenACC async (Ping x3) Chung-Lin Tang
  0 siblings, 1 reply; 49+ messages in thread
From: Chung-Lin Tang @ 2016-04-16  7:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Thomas Schwinge, Cesar Philippidis, Julian Brown

Ping.

On 2016/4/8 07:02 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/3/29 5:48 PM, Chung-Lin Tang wrote:
>> I've updated this patch for trunk (as attached), and re-tested without
>> regressions. This patch is still a fix for libgomp.oacc-c-c++-common/asyncwait-1.c,
>> which FAILs right now.
>>
>> ChangeLog is still as before. Is this okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
>>> Ping.
>>>
>>> On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
>>>> Hi, this patch reworks some of the way that asynchronous copyouts are
>>>> implemented for OpenACC in libgomp.
>>>>
>>>> Before this patch, we had a somewhat confusing way of implementing this
>>>> by having two refcounts for each mapping: refcount and async_refcount,
>>>> which I never got working again after the last wave of async regressions
>>>> showed up.
>>>>
>>>> So this patch implements what I believe to be a simplification: async_refcount
>>>> is removed, and instead of trying to queue the async copyouts during unmapping
>>>> we actually do that during the plugin event handling. This requires a addition
>>>> of the async stream integer as an argument to the register_async_cleanup
>>>> plugin hook, but overall I think this should be more elegant than before.
>>>>
>>>> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
>>>> It also fixed data-[23].c regressions before, but some other recent check-in
>>>> happened to already fixed those.
>>>>
>>>> Tested without regressions, is this okay for trunk?
>>>>
>>>> Thanks,
>>>> Chung-Lin
>>>>
>>>> 2015-11-24  Chung-Lin Tang  <cltang@codesourcery.com>
>>>>
>>>>         * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
>>>>         * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
>>>>         parameter, use to set async stream around call to gomp_unmap_vars,
>>>>         call gomp_unmap_vars() with 'do_copyfrom' set to true.
>>>>         * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
>>>>         (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
>>>>         events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
>>>>         (event_add): Add int parameter, initialize 'val' field when
>>>>         adding new ptx_event struct.
>>>>         (nvptx_evec): Adjust event_add() call arguments.
>>>>         (nvptx_host2dev): Likewise.
>>>>         (nvptx_dev2host): Likewise.
>>>>         (nvptx_wait_async): Likewise.
>>>>         (nvptx_wait_all_async): Likewise.
>>>>         (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
>>>>         pass to event_add() call.
>>>>         * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
>>>>         parameter.
>>>>         * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
>>>>         call openacc.register_async_cleanup_func() hook.
>>>>         * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
>>>>         * target.c (gomp_copy_from_async): Delete function.
>>>>         (gomp_map_vars): Remove async_refcount.
>>>>         (gomp_unmap_vars): Likewise.
>>>>         (gomp_load_image_to_device): Likewise.
>>>>         (omp_target_associate_ptr): Likewise.
>>>>         * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
>>>>         (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
>>>>         (gomp_copy_from_async): Remove.
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async (Ping x3)
  2016-04-16  7:41                           ` Chung-Lin Tang
@ 2016-05-11  6:47                             ` Chung-Lin Tang
  0 siblings, 0 replies; 49+ messages in thread
From: Chung-Lin Tang @ 2016-05-11  6:47 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Thomas Schwinge, Cesar Philippidis, Julian Brown

Ping x3

On 2016/4/16 3:40 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/4/8 07:02 PM, Chung-Lin Tang wrote:
>> Ping.
>>
>> On 2016/3/29 5:48 PM, Chung-Lin Tang wrote:
>>> I've updated this patch for trunk (as attached), and re-tested without
>>> regressions. This patch is still a fix for libgomp.oacc-c-c++-common/asyncwait-1.c,
>>> which FAILs right now.
>>>
>>> ChangeLog is still as before. Is this okay for trunk?
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
>>>> Ping.
>>>>
>>>> On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
>>>>> Hi, this patch reworks some of the way that asynchronous copyouts are
>>>>> implemented for OpenACC in libgomp.
>>>>>
>>>>> Before this patch, we had a somewhat confusing way of implementing this
>>>>> by having two refcounts for each mapping: refcount and async_refcount,
>>>>> which I never got working again after the last wave of async regressions
>>>>> showed up.
>>>>>
>>>>> So this patch implements what I believe to be a simplification: async_refcount
>>>>> is removed, and instead of trying to queue the async copyouts during unmapping
>>>>> we actually do that during the plugin event handling. This requires a addition
>>>>> of the async stream integer as an argument to the register_async_cleanup
>>>>> plugin hook, but overall I think this should be more elegant than before.
>>>>>
>>>>> This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
>>>>> It also fixed data-[23].c regressions before, but some other recent check-in
>>>>> happened to already fixed those.
>>>>>
>>>>> Tested without regressions, is this okay for trunk?
>>>>>
>>>>> Thanks,
>>>>> Chung-Lin
>>>>>
>>>>> 2015-11-24  Chung-Lin Tang  <cltang@codesourcery.com>
>>>>>
>>>>>         * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
>>>>>         * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
>>>>>         parameter, use to set async stream around call to gomp_unmap_vars,
>>>>>         call gomp_unmap_vars() with 'do_copyfrom' set to true.
>>>>>         * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
>>>>>         (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
>>>>>         events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
>>>>>         (event_add): Add int parameter, initialize 'val' field when
>>>>>         adding new ptx_event struct.
>>>>>         (nvptx_evec): Adjust event_add() call arguments.
>>>>>         (nvptx_host2dev): Likewise.
>>>>>         (nvptx_dev2host): Likewise.
>>>>>         (nvptx_wait_async): Likewise.
>>>>>         (nvptx_wait_all_async): Likewise.
>>>>>         (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
>>>>>         pass to event_add() call.
>>>>>         * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
>>>>>         parameter.
>>>>>         * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
>>>>>         call openacc.register_async_cleanup_func() hook.
>>>>>         * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
>>>>>         * target.c (gomp_copy_from_async): Delete function.
>>>>>         (gomp_map_vars): Remove async_refcount.
>>>>>         (gomp_unmap_vars): Likewise.
>>>>>         (gomp_load_image_to_device): Likewise.
>>>>>         (omp_target_associate_ptr): Likewise.
>>>>>         * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
>>>>>         (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
>>>>>         (gomp_copy_from_async): Remove.
>>>>>
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-03-29 10:15                       ` Chung-Lin Tang
  2016-04-08 11:03                         ` Chung-Lin Tang
@ 2016-05-12 10:03                         ` Jakub Jelinek
  2016-05-12 10:47                           ` Thomas Schwinge
  1 sibling, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2016-05-12 10:03 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: gcc-patches, Thomas Schwinge, Cesar Philippidis, Julian Brown

On Tue, Mar 29, 2016 at 05:48:25PM +0800, Chung-Lin Tang wrote:
> I've updated this patch for trunk (as attached), and re-tested without
> regressions. This patch is still a fix for libgomp.oacc-c-c++-common/asyncwait-1.c,
> which FAILs right now.
> 
> ChangeLog is still as before. Is this okay for trunk?

Mostly ok for trunk, but as it is an ABI incompatible change for the plugin
interface (affecting OpenACC capable plugins only), I think you just should
rename the plugin callback you add the argument to, so that
          || !DLSYM_OPT (openacc.register_async_cleanup,
                         openacc_register_async_cleanup)
would fail when trying to load GCC 6.x nvptx plugin from GCC 7.x libgomp
or vice versa.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-05-12 10:03                         ` [PATCH, libgomp] Rewire OpenACC async Jakub Jelinek
@ 2016-05-12 10:47                           ` Thomas Schwinge
  2016-05-12 10:56                             ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Schwinge @ 2016-05-12 10:47 UTC (permalink / raw)
  To: Jakub Jelinek, Chung-Lin Tang
  Cc: gcc-patches, Cesar Philippidis, Julian Brown

Hi!

On Thu, 12 May 2016 12:02:58 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> ABI incompatible change for the plugin
> interface (affecting OpenACC capable plugins only), I think you just should
> rename the plugin callback you add the argument to, so that
>           || !DLSYM_OPT (openacc.register_async_cleanup,
>                          openacc_register_async_cleanup)
> would fail when trying to load GCC 6.x nvptx plugin from GCC 7.x libgomp
> or vice versa.

Hmm, as far as I remember, we had previously agreed that libgomp plugin
ABI changes are not of any concern, given that libgomp and its plugins
will always be built from the same sources, at the same time, and so
their ABIs will always correspond?  Discussed before in
<http://news.gmane.org/find-root.php?message_id=%3C56449E23.3050400%40acm.org%3E>,
for example.


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-05-12 10:47                           ` Thomas Schwinge
@ 2016-05-12 10:56                             ` Jakub Jelinek
  2016-05-17  8:57                               ` Chung-Lin Tang
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2016-05-12 10:56 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Chung-Lin Tang, gcc-patches, Cesar Philippidis, Julian Brown

On Thu, May 12, 2016 at 12:47:18PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 12 May 2016 12:02:58 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > ABI incompatible change for the plugin
> > interface (affecting OpenACC capable plugins only), I think you just should
> > rename the plugin callback you add the argument to, so that
> >           || !DLSYM_OPT (openacc.register_async_cleanup,
> >                          openacc_register_async_cleanup)
> > would fail when trying to load GCC 6.x nvptx plugin from GCC 7.x libgomp
> > or vice versa.
> 
> Hmm, as far as I remember, we had previously agreed that libgomp plugin
> ABI changes are not of any concern, given that libgomp and its plugins
> will always be built from the same sources, at the same time, and so
> their ABIs will always correspond?  Discussed before in
> <http://news.gmane.org/find-root.php?message_id=%3C56449E23.3050400%40acm.org%3E>,
> for example.

I thought the agreement was that it is ok not to support mixing of
different libgomp and plugin versions, but we should make sure that we
refuse to load the plugin in case of mismatch, instead of silently crashing.
Of course, changes in unreleased compiler versions are fine.

So, I'm not asking for compatibility in that 6.x nvptx plugin should still
work with 7.x libgomp and vice versa, but that it would be ignored or
diagnosed if somebody mixes it.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-05-12 10:56                             ` Jakub Jelinek
@ 2016-05-17  8:57                               ` Chung-Lin Tang
  2016-05-17  9:01                                 ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Chung-Lin Tang @ 2016-05-17  8:57 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge
  Cc: gcc-patches, Cesar Philippidis, Julian Brown

On 2016/5/12 6:56 PM, Jakub Jelinek wrote:
> On Thu, May 12, 2016 at 12:47:18PM +0200, Thomas Schwinge wrote:
>> Hi!
>>
>> On Thu, 12 May 2016 12:02:58 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
>>> ABI incompatible change for the plugin
>>> interface (affecting OpenACC capable plugins only), I think you just should
>>> rename the plugin callback you add the argument to, so that
>>>           || !DLSYM_OPT (openacc.register_async_cleanup,
>>>                          openacc_register_async_cleanup)
>>> would fail when trying to load GCC 6.x nvptx plugin from GCC 7.x libgomp
>>> or vice versa.
>>
>> Hmm, as far as I remember, we had previously agreed that libgomp plugin
>> ABI changes are not of any concern, given that libgomp and its plugins
>> will always be built from the same sources, at the same time, and so
>> their ABIs will always correspond?  Discussed before in
>> <http://news.gmane.org/find-root.php?message_id=%3C56449E23.3050400%40acm.org%3E>,
>> for example.
> 
> I thought the agreement was that it is ok not to support mixing of
> different libgomp and plugin versions, but we should make sure that we
> refuse to load the plugin in case of mismatch, instead of silently crashing.
> Of course, changes in unreleased compiler versions are fine.
> 
> So, I'm not asking for compatibility in that 6.x nvptx plugin should still
> work with 7.x libgomp and vice versa, but that it would be ignored or
> diagnosed if somebody mixes it.
> 
> 	Jakub
> 

I assume that bumping GOMP_VERSION in include/gomp-constants.h would be enough?

Renaming is just really ugly, probably okay for the async patch which touches just one plugin hook,
but the device lock deadlock fix audits several of them in a minor way; doesn't look like
the right solution to rename them all.

Chung-Lin

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-05-17  8:57                               ` Chung-Lin Tang
@ 2016-05-17  9:01                                 ` Jakub Jelinek
  2016-05-18  9:40                                   ` Chung-Lin Tang
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Jelinek @ 2016-05-17  9:01 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: Thomas Schwinge, gcc-patches, Cesar Philippidis, Julian Brown

On Tue, May 17, 2016 at 04:56:42PM +0800, Chung-Lin Tang wrote:
> I assume that bumping GOMP_VERSION in include/gomp-constants.h would be enough?

I think so.

	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-05-17  9:01                                 ` Jakub Jelinek
@ 2016-05-18  9:40                                   ` Chung-Lin Tang
  2016-05-18  9:50                                     ` Jakub Jelinek
  0 siblings, 1 reply; 49+ messages in thread
From: Chung-Lin Tang @ 2016-05-18  9:40 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, gcc-patches, Cesar Philippidis, Julian Brown

[-- Attachment #1: Type: text/plain, Size: 680 bytes --]

On 2016/5/17 5:00 PM, Jakub Jelinek wrote:
> On Tue, May 17, 2016 at 04:56:42PM +0800, Chung-Lin Tang wrote:
>> I assume that bumping GOMP_VERSION in include/gomp-constants.h would be enough?
> 
> I think so.
> 
> 	Jakub
> 

How is this patch? I have added a comment to remind to adjust the version number
when plugin interface changes are made.

And, if this approach to solve the plugin interface problem is okay, can I regard the
device lock deadlock patches and async patch approved for trunk?

Thanks,
Chung-Lin

	include/
	* gomp-constants.h (GOMP_VERSION): Increment to 1, add comment describe the
	need for increment this macro whenever the plugin interface is modified.


[-- Attachment #2: a.diff --]
[-- Type: text/plain, Size: 690 bytes --]

Index: gomp-constants.h
===================================================================
--- gomp-constants.h	(revision 236363)
+++ gomp-constants.h	(working copy)
@@ -196,8 +196,10 @@ enum gomp_map_kind
 /* Internal to libgomp.  */
 #define GOMP_TARGET_FLAG_UPDATE		(1U << 31)
 
-/* Versions of libgomp and device-specific plugins.  */
-#define GOMP_VERSION	0
+/* Versions of libgomp and device-specific plugins.  GOMP_VERSION
+   should be incremented whenever an ABI-incompatible change is introduced
+   to the plugin interface defined in libgomp/libgomp.h.  */
+#define GOMP_VERSION	1
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
 #define GOMP_VERSION_HSA 0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH, libgomp] Rewire OpenACC async
  2016-05-18  9:40                                   ` Chung-Lin Tang
@ 2016-05-18  9:50                                     ` Jakub Jelinek
  0 siblings, 0 replies; 49+ messages in thread
From: Jakub Jelinek @ 2016-05-18  9:50 UTC (permalink / raw)
  To: Chung-Lin Tang
  Cc: Thomas Schwinge, gcc-patches, Cesar Philippidis, Julian Brown

On Wed, May 18, 2016 at 05:40:33PM +0800, Chung-Lin Tang wrote:
> On 2016/5/17 5:00 PM, Jakub Jelinek wrote:
> > On Tue, May 17, 2016 at 04:56:42PM +0800, Chung-Lin Tang wrote:
> >> I assume that bumping GOMP_VERSION in include/gomp-constants.h would be enough?
> > 
> > I think so.
> > 
> > 	Jakub
> > 
> 
> How is this patch? I have added a comment to remind to adjust the version number
> when plugin interface changes are made.

Ok.

> And, if this approach to solve the plugin interface problem is okay, can I regard the
> device lock deadlock patches and async patch approved for trunk?

Yes.

> 	include/
> 	* gomp-constants.h (GOMP_VERSION): Increment to 1, add comment describe the
> 	need for increment this macro whenever the plugin interface is modified.
> 

> Index: gomp-constants.h
> ===================================================================
> --- gomp-constants.h	(revision 236363)
> +++ gomp-constants.h	(working copy)
> @@ -196,8 +196,10 @@ enum gomp_map_kind
>  /* Internal to libgomp.  */
>  #define GOMP_TARGET_FLAG_UPDATE		(1U << 31)
>  
> -/* Versions of libgomp and device-specific plugins.  */
> -#define GOMP_VERSION	0
> +/* Versions of libgomp and device-specific plugins.  GOMP_VERSION
> +   should be incremented whenever an ABI-incompatible change is introduced
> +   to the plugin interface defined in libgomp/libgomp.h.  */
> +#define GOMP_VERSION	1
>  #define GOMP_VERSION_NVIDIA_PTX 1
>  #define GOMP_VERSION_INTEL_MIC 0
>  #define GOMP_VERSION_HSA 0


	Jakub

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2016-05-18  9:50 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-15 12:23 [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data Ilya Verbin
2015-06-15 14:30 ` Jakub Jelinek
2015-06-15 16:22   ` Ilya Verbin
2015-06-15 16:30     ` Jakub Jelinek
2015-06-15 19:54       ` Ilya Verbin
2015-06-15 19:58         ` Jakub Jelinek
2015-06-19 22:36           ` Ilya Verbin
2015-06-23 11:51             ` Ilya Verbin
2015-06-23 12:10               ` Jakub Jelinek
2015-06-23 14:55                 ` Ilya Verbin
2015-06-23 15:04                   ` Jakub Jelinek
2015-10-19 16:33                 ` OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data) Thomas Schwinge
2015-10-19 16:48                   ` Ilya Verbin
2015-10-20  7:37                     ` Jakub Jelinek
2015-10-22 18:46                   ` [gomp4] " Thomas Schwinge
2015-11-18 15:18                     ` [gomp4] OpenACC async clause regressions Tom de Vries
2016-03-30 16:32                       ` Thomas Schwinge
2015-11-24 10:32                   ` [PATCH, libgomp] Rewire OpenACC async Chung-Lin Tang
2015-11-23 13:21                     ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
     [not found]                       ` <56628C72.9040802@codesourcery.com>
2015-11-23 11:13                         ` [gomp4] Adjust Fortran OACC async lib test Chung-Lin Tang
2015-12-02  9:21                           ` Chung-Lin Tang
2015-12-08 11:46                           ` Thomas Schwinge
2015-12-02  9:22                       ` [PATCH, C++] Wrap OpenACC wait in EXPR_STMT Chung-Lin Tang
2015-12-03  8:51                       ` Thomas Schwinge
2015-12-03  8:59                         ` Thomas Schwinge
2015-12-03 10:05                           ` Chung-Lin Tang
2015-12-03 10:11                             ` Jakub Jelinek
2015-12-03 10:33                               ` Chung-Lin Tang
2015-12-05  9:57                                 ` Chung-Lin Tang
2015-12-07  5:00                       ` Jason Merrill
2015-12-01 12:01                     ` [PATCH, libgomp] Rewire OpenACC async Julian Brown
2015-12-05  9:23                       ` Chung-Lin Tang
2015-12-22  8:59                     ` Chung-Lin Tang
2016-03-29 10:15                       ` Chung-Lin Tang
2016-04-08 11:03                         ` Chung-Lin Tang
2016-04-16  7:41                           ` Chung-Lin Tang
2016-05-11  6:47                             ` [PATCH, libgomp] Rewire OpenACC async (Ping x3) Chung-Lin Tang
2016-05-12 10:03                         ` [PATCH, libgomp] Rewire OpenACC async Jakub Jelinek
2016-05-12 10:47                           ` Thomas Schwinge
2016-05-12 10:56                             ` Jakub Jelinek
2016-05-17  8:57                               ` Chung-Lin Tang
2016-05-17  9:01                                 ` Jakub Jelinek
2016-05-18  9:40                                   ` Chung-Lin Tang
2016-05-18  9:50                                     ` Jakub Jelinek
2015-06-24 11:43               ` [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data Jakub Jelinek
2015-06-24 20:14                 ` Ilya Verbin
2015-06-24 20:21                   ` Jakub Jelinek
2015-10-13 14:50         ` Ilya Verbin
2015-10-13 19:27           ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).