public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Openacc launch API
@ 2015-08-25 13:40 Nathan Sidwell
  2015-08-28 17:30 ` Nathan Sidwell
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-08-25 13:40 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

Jakub,

This patch changes the launch API for openacc parallels.  The current scheme 
passes the launch dimensions as 3 separate parameters to the GOACC_parallel 
function.  This is problematic for a couple of reasons:

1) these must be validated in the host compiler

2) they provide no extension to support a variety of different offload devices 
with different geometry requirements.

This patch changes things so that the function tables emitted by (ptx) 
mkoffloads includes the geometry triplet for each function.  This allows them to 
be validated and/or manipulated in the offload compiler.  However, this only 
works for compile-time known dimensions -- which is a common case.  To deal with 
runtime-computed dimensions we have to retain the host-side compiler's 
calculation and pass that into the GOACC_parallel function.  We change 
GOACC_parallel to take a variadic list of keyed operands ending with a sentinel 
marker.  These keyed operands have a slot for expansion to support multiple 
different offload devices.

We also extend the functionality of the 'oacc function' internal attribute. 
Rather than being a simple marker, it now has a value, which is a TREE_LIST of 
the geometry required.  The geometry is held as INTEGER_CSTs on the TREE_VALUE 
slots.  Runtime-calculated values are represented by an INTEGER_CST of zero. 
We'll also use this representation for  'routines', where the TREE_PURPOSE slot 
will be used to indicate the levels at which a routine might spawn a partitioned 
loop.  Again, to allow future expansion supporting a number of different offload 
devices, this can become a list-of-lists, keyed by and offload device 
identifier.  The offload  compiler can manipulate this data, and a later patch 
will do this within a new oacc-xform pass.

I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide 
a forwarding function. However, as the mkoffload data is incompatible, this is 
probably overkill.  I've had to increment the (just committed) version number to 
detect the change in data representation.  So any attempt to run an old binary 
with a new libgomp will fail at the loading point.  We could simply keep the 
same 'GOACC_parallel' name and not need any new symbols.  WDYT?

ok?

nathan

[-- Attachment #2: trunk-launch.patch --]
[-- Type: text/x-patch, Size: 42511 bytes --]

2015-08-25  Nathan Sidwell  <nathan@codesourcery.com>

	inlude/
	* gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment.
	(GOMP_DIM_GANG, GOMP_DIM_WORKER, GOMP_DIM_VECTOR, GOMP_DIM_MAX,
	GOMP_DIM_MASK): New.
	(GOMP_LAUNCH_END, GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC,
	GOMP_LAUNCH_WAIT): New.
	(GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUNCH_DEVICE_SHIFT,
	GOMP_LAUNCH_OP_SHIFT): New.
	(GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE,
	GOMP_LAUNCH_OP): New.
	(GOMP_LAUNCH_OP_MAX): New.

	libgomp/
	* libgomp.h (acc_dispatch_t): Replace separate geometry args with
	array.
	* libgomp.map (GOACC_parallel_keyed): New.
	* oacc-parallel.c (goacc_wait): Take pointer to va_list.  Adjust
	all callers.
	(GOACC_parallel_keyed): New interface.  Lose geometry arguments
	and take keyed varargs list.  Adjust call to exec_func.
	(GOACC_parallel): Forward to GACC_parallel_keyed.
	* libgomp_g.h (GOACC_parallel): Remove.
	(GOACC_parallel_keyed): Declare.
	* plugin/plugin-nvptx.c (struct targ_fn_launch): New struct.
	(stuct targ_gn_descriptor): Replace name field with launch field.
	(nvptx_exec): Lose separate geometry args, take array.  Process
	dynamic dimensions and adjust.
	(struct nvptx_tdata): Replace fn_names field with fn_descs.
	(GOMP_OFFLOAD_load_image): Adjust for change in function table
	data.
	(GOMP_OFFLOAD_openacc_parallel): Adjust for change in dimension
	passing.
	* oacc-host.c (host_openacc_exec): Adjust for change in dimension
	passing.

	gcc/
	* config/nvptx/nvptx.c: Include omp-low.h and gomp-constants.h.
	(nvptx_record_offload_symbol): Record function execution geometry.
	* config/nvptx/mkoffload.c (process): Include launch geometry in
	function data.
	* omp-low.c (oacc_launch_pack): New.
	(replace_oacc_fn_attrib): New.
	(set_oacc_fn_attrib): New.
	(get_oacc_fn_attrib): New.
	(expand_omp_target): Create keyed varargs for GOACC_parallel call
	generation.
	* omp-low.h (get_oacc_fn_attrib): Declare.
	* builtin-types.def (DEF_FUNCTION_TyPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* tree.h (OMP_CLAUSE_EXPR): New.
	* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Change target fn name.

	gcc/lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/c-family/
	* c-common.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* types.def (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

Index: include/gomp-constants.h
===================================================================
--- include/gomp-constants.h	(revision 227137)
+++ include/gomp-constants.h	(working copy)
@@ -115,11 +115,34 @@ enum gomp_map_kind
 
 /* Versions of libgomp and device-specific plugins.  */
 #define GOMP_VERSION	0
-#define GOMP_VERSION_NVIDIA_PTX 0
+#define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0xffff)
 #define GOMP_VERSION_DEV(PACK) ((PACK) & 0xffff)
 
+#define GOMP_DIM_GANG	0
+#define GOMP_DIM_WORKER	1
+#define GOMP_DIM_VECTOR	2
+#define GOMP_DIM_MAX	3
+#define GOMP_DIM_MASK(X) (1u << (X))
+
+/* Varadic launch arguments.  */
+#define GOMP_LAUNCH_END 	0  /* End of args, no dev or op */
+#define GOMP_LAUNCH_DIM		1  /* Launch dimensions, op = mask */
+#define GOMP_LAUNCH_ASYNC	2  /* Async, op = cst val if not MAX  */
+#define GOMP_LAUNCH_WAIT	3  /* Waits, op = num waits.  */
+#define GOMP_LAUNCH_CODE_SHIFT	28
+#define GOMP_LAUNCH_DEVICE_SHIFT 16
+#define GOMP_LAUNCH_OP_SHIFT 0
+#define GOMP_LAUNCH_PACK(CODE,DEVICE,OP)	\
+  (((CODE) << GOMP_LAUNCH_CODE_SHIFT)		\
+   | ((DEVICE) << GOMP_LAUNCH_DEVICE_SHIFT)	\
+   | ((OP) << GOMP_LAUNCH_OP_SHIFT))
+#define GOMP_LAUNCH_CODE(X) (((X) >> GOMP_LAUNCH_CODE_SHIFT) & 0xf)
+#define GOMP_LAUNCH_DEVICE(X) (((X) >> GOMP_LAUNCH_DEVICE_SHIFT) & 0xfff)
+#define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0xffff)
+#define GOMP_LAUNCH_OP_MAX 0xffff
+
 #endif
Index: libgomp/libgomp.h
===================================================================
--- libgomp/libgomp.h	(revision 227137)
+++ libgomp/libgomp.h	(working copy)
@@ -693,7 +693,7 @@ typedef struct acc_dispatch_t
 
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
-		     unsigned short *, int, int, int, int, void *);
+		     unsigned short *, int, unsigned *, void *);
 
   /* Async cleanup callback registration.  */
   void (*register_async_cleanup_func) (void *);
Index: libgomp/oacc-parallel.c
===================================================================
--- libgomp/oacc-parallel.c	(revision 227137)
+++ libgomp/oacc-parallel.c	(working copy)
@@ -49,14 +49,12 @@ find_pset (int pos, size_t mapnum, unsig
   return kind == GOMP_MAP_TO_PSET;
 }
 
-static void goacc_wait (int async, int num_waits, va_list ap);
+static void goacc_wait (int async, int num_waits, va_list *ap);
 
 void
-GOACC_parallel (int device, void (*fn) (void *),
-		size_t mapnum, void **hostaddrs, size_t *sizes,
-		unsigned short *kinds,
-		int num_gangs, int num_workers, int vector_length,
-		int async, int num_waits, ...)
+GOACC_parallel_keyed (int device, void (*fn) (void *),
+		      size_t mapnum, void **hostaddrs, size_t *sizes,
+		      unsigned short *kinds, ...)
 {
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
   va_list ap;
@@ -68,22 +66,16 @@ GOACC_parallel (int device, void (*fn) (
   struct splay_tree_key_s k;
   splay_tree_key tgt_fn_key;
   void (*tgt_fn);
-
-  if (num_gangs != 1)
-    gomp_fatal ("num_gangs (%d) different from one is not yet supported",
-		num_gangs);
-  if (num_workers != 1)
-    gomp_fatal ("num_workers (%d) different from one is not yet supported",
-		num_workers);
+  int async = GOMP_ASYNC_SYNC;
+  unsigned dims[GOMP_DIM_MAX];
+  unsigned tag;
 
 #ifdef HAVE_INTTYPES_H
-  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p, "
-		 "async = %d\n",
-	      __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds, async);
+  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p\n",
+	      __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds);
 #else
-  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
-	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds,
-	      async);
+  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds);
 #endif
   goacc_lazy_initialize ();
 
@@ -105,12 +97,45 @@ GOACC_parallel (int device, void (*fn) (
       return;
     }
 
-  if (num_waits)
+  va_start (ap, kinds);
+  /* TODO: This will need amending when device_type is implemented.  */
+  while (GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0)
+	 != (tag = va_arg (ap, unsigned)))
     {
-      va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
-      va_end (ap);
+      assert (!GOMP_LAUNCH_DEVICE (tag));
+      switch (GOMP_LAUNCH_CODE (tag))
+	{
+	case GOMP_LAUNCH_DIM:
+	  {
+	    unsigned mask = GOMP_LAUNCH_OP (tag);
+
+	    for (i = 0; i != GOMP_DIM_MAX; i++)
+	      if (mask & GOMP_DIM_MASK (i))
+		dims[i] = va_arg (ap, unsigned);
+	  }
+	  break;
+
+	case GOMP_LAUNCH_ASYNC:
+	  {
+	    /* Small constant values are encoded in the operand.  */
+	    async = GOMP_LAUNCH_OP (tag);
+
+	    if (async == GOMP_LAUNCH_OP_MAX)
+	      async = va_arg (ap, unsigned);
+	    break;
+	  }
+
+	case GOMP_LAUNCH_WAIT:
+	  {
+	    unsigned num_waits = GOMP_LAUNCH_OP (tag);
+
+	    if (num_waits)
+	      goacc_wait (async, num_waits, &ap);
+	    break;
+	  }
+	}
     }
+  va_end (ap);
   
   acc_dev->openacc.async_set_async_func (async);
 
@@ -138,9 +163,8 @@ GOACC_parallel (int device, void (*fn) (
     devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
 			    + tgt->list[i]->tgt_offset);
 
-  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
-			      num_gangs, num_workers, vector_length, async,
-			      tgt);
+  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes,
+			      kinds, async, dims, tgt);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
@@ -154,6 +178,38 @@ GOACC_parallel (int device, void (*fn) (
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
+/* Legacy entry point.  */
+void
+GOACC_parallel (int device, void (*fn) (void *),
+		size_t mapnum, void **hostaddrs, size_t *sizes,
+		unsigned short *kinds,
+		int num_gangs, int num_workers, int vector_length,
+		int async, int num_waits, ...)
+{
+  int waits[9];
+  unsigned ix;
+  va_list ap;
+
+  if (num_waits > 8)
+    gomp_fatal ("Too many waits for legacy interface");
+  
+  va_start (ap, num_waits);
+  for (ix = 0; ix != num_waits; ix++)
+    waits[ix] = va_arg (ap, int);
+  waits[ix] = GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0);
+  va_end (ap);
+
+  GOACC_parallel_keyed (device, fn, mapnum, hostaddrs, sizes, kinds,
+			GOMP_LAUNCH_PACK (GOMP_LAUNCH_DIM, 0,
+					  GOMP_DIM_MASK (GOMP_DIM_MAX) - 1),
+			num_gangs, num_workers, vector_length,
+			GOMP_LAUNCH_PACK (GOMP_LAUNCH_ASYNC, 0,
+					  GOMP_LAUNCH_OP_MAX), async,
+			GOMP_LAUNCH_PACK (GOMP_LAUNCH_WAIT, 0, num_waits),
+			async, waits[0], waits[1], waits[2], waits[3],
+			waits[4], waits[5], waits[6], waits[7], waits[8]);
+}
+
 void
 GOACC_data_start (int device, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
@@ -230,7 +286,7 @@ GOACC_enter_exit_data (int device, size_
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
 
@@ -344,15 +400,15 @@ GOACC_enter_exit_data (int device, size_
 }
 
 static void
-goacc_wait (int async, int num_waits, va_list ap)
+goacc_wait (int async, int num_waits, va_list *ap)
 {
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
   while (num_waits--)
     {
-      int qid = va_arg (ap, int);
-
+      int qid = va_arg (*ap, int);
+      
       if (acc_async_test (qid))
 	continue;
 
@@ -389,7 +445,7 @@ GOACC_update (int device, size_t mapnum,
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
 
@@ -430,7 +486,7 @@ GOACC_wait (int async, int num_waits, ..
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
   else if (async == acc_async_sync)
Index: libgomp/libgomp_g.h
===================================================================
--- libgomp/libgomp_g.h	(revision 227137)
+++ libgomp/libgomp_g.h	(working copy)
@@ -222,9 +222,8 @@ extern void GOACC_data_start (int, size_
 extern void GOACC_data_end (void);
 extern void GOACC_enter_exit_data (int, size_t, void **,
 				   size_t *, unsigned short *, int, int, ...);
-extern void GOACC_parallel (int, void (*) (void *), size_t,
-			    void **, size_t *, unsigned short *, int, int, int,
-			    int, int, ...);
+extern void GOACC_parallel_keyed (int, void (*) (void *), size_t,
+			      void **, size_t *, unsigned short *, ...);
 extern void GOACC_update (int, size_t, void **, size_t *,
 			  unsigned short *, int, int, ...);
 extern void GOACC_wait (int, int, ...);
Index: libgomp/libgomp.map
===================================================================
--- libgomp/libgomp.map	(revision 227137)
+++ libgomp/libgomp.map	(working copy)
@@ -332,6 +332,11 @@ GOACC_2.0 {
 	GOACC_get_num_threads;
 };
 
+GOACC_2.0,1 {
+  global:
+	GOACC_parallel_keyed;
+} GOACC_2.0;
+
 GOMP_PLUGIN_1.0 {
   global:
 	GOMP_PLUGIN_malloc;
Index: libgomp/plugin/plugin-nvptx.c
===================================================================
--- libgomp/plugin/plugin-nvptx.c	(revision 227137)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -282,12 +282,20 @@ map_push (struct ptx_stream *s, int asyn
   return;
 }
 
+/* Target data function launch information.  */
+
+struct targ_fn_launch
+{
+  const char *fn;
+  unsigned short dim[GOMP_DIM_MAX];
+};
+
 /* Descriptor of a loaded function.  */
 
 struct targ_fn_descriptor
 {
   CUfunction fn;
-  const char *name;
+  const struct targ_fn_launch *launch;
 };
 
 /* A loaded PTX image.  */
@@ -929,8 +937,8 @@ event_add (enum ptx_event_type type, CUe
 
 void
 nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
-	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
-	  int vector_length, int async, void *targ_mem_desc)
+	    size_t *sizes, unsigned short *kinds, int async, unsigned *dims,
+	    void *targ_mem_desc)
 {
   struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
   CUfunction function;
@@ -939,7 +947,6 @@ nvptx_exec (void (*fn), size_t mapnum, v
   struct ptx_stream *dev_str;
   void *kargs[1];
   void *hp, *dp;
-  unsigned int nthreads_in_block;
   struct nvptx_thread *nvthd = nvptx_thread ();
   const char *maybe_abort_msg = "(perhaps abort was called)";
 
@@ -948,6 +955,20 @@ nvptx_exec (void (*fn), size_t mapnum, v
   dev_str = select_stream_for_async (async, pthread_self (), false, NULL);
   assert (dev_str == nvthd->current_stream);
 
+  /* Initialize the launch dimensions.  Typically this is constant,
+     provided by the device compiler, but we must permit runtime
+     values.  */
+  for (i = 0; i != GOMP_DIM_MAX; i++)
+    if (targ_fn->launch->dim[i])
+      dims[i] = targ_fn->launch->dim[i];
+
+  if (dims[GOMP_DIM_GANG] != 1)
+    GOMP_PLUGIN_fatal ("non-unity num_gangs (%d) not supported",
+		       dims[GOMP_DIM_GANG]);
+  if (dims[GOMP_DIM_WORKER] != 1)
+    GOMP_PLUGIN_fatal ("non-unity num_workers (%d) not supported",
+		       dims[GOMP_DIM_WORKER]);
+
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
      the host and the device. HP is a host pointer to the new chunk, and DP is
      the corresponding device pointer.  */
@@ -965,35 +986,21 @@ nvptx_exec (void (*fn), size_t mapnum, v
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuMemcpy failed: %s", cuda_error (r));
 
-  GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch\n", __FUNCTION__, targ_fn->name);
+  GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch"
+		     " gangs=%u, workers=%u, vectors=%u\n",
+		     __FUNCTION__, targ_fn->launch->fn,
+		     dims[GOMP_DIM_GANG], dims[GOMP_DIM_WORKER], dims[GOMP_DIM_VECTOR]);
 
   // OpenACC		CUDA
   //
-  // num_gangs		blocks
-  // num_workers	warps (where a warp is equivalent to 32 threads)
-  // vector length	threads
-  //
-
-  /* The openacc vector_length clause 'determines the vector length to use for
-     vector or SIMD operations'.  The question is how to map this to CUDA.
-
-     In CUDA, the warp size is the vector length of a CUDA device.  However, the
-     CUDA interface abstracts away from that, and only shows us warp size
-     indirectly in maximum number of threads per block, which is a product of
-     warp size and the number of hyperthreads of a multiprocessor.
-
-     We choose to map openacc vector_length directly onto the number of threads
-     in a block, in the x dimension.  This is reflected in gcc code generation
-     that uses ThreadIdx.x to access vector elements.
-
-     Attempting to use an openacc vector_length of more than the maximum number
-     of threads per block will result in a cuda error.  */
-  nthreads_in_block = vector_length;
+  // num_gangs		nctaid.x
+  // num_workers	ntid.y
+  // vector length	ntid.x
 
   kargs[0] = &dp;
   r = cuLaunchKernel (function,
-		      num_gangs, 1, 1,
-		      nthreads_in_block, 1, 1,
+		      dims[GOMP_DIM_GANG], 1, 1,
+		      dims[GOMP_DIM_VECTOR], dims[GOMP_DIM_WORKER], 1,
 		      0, dev_str->stream, kargs, 0);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
@@ -1039,7 +1046,7 @@ nvptx_exec (void (*fn), size_t mapnum, v
 #endif
 
   GOMP_PLUGIN_debug (0, "  %s: kernel %s: finished\n", __FUNCTION__,
-		     targ_fn->name);
+		     targ_fn->launch->fn);
 
 #ifndef DISABLE_ASYNC
   if (async < acc_async_noval)
@@ -1567,7 +1574,7 @@ typedef struct nvptx_tdata
   const char *const *var_names;
   size_t var_num;
 
-  const char *const *fn_names;
+  const struct targ_fn_launch *fn_descs;
   size_t fn_num;
 } nvptx_tdata_t;
 
@@ -1588,7 +1595,8 @@ GOMP_OFFLOAD_load_image (int ord, unsign
 			 struct addr_pair **target_table)
 {
   CUmodule module;
-  const char *const *fn_names, *const *var_names;
+  const char *const *var_names;
+  const struct targ_fn_launch *fn_descs;
   unsigned int fn_entries, var_entries, i, j;
   CUresult r;
   struct targ_fn_descriptor *targ_fns;
@@ -1617,7 +1625,7 @@ GOMP_OFFLOAD_load_image (int ord, unsign
   var_entries = img_header->var_num;
   var_names = img_header->var_names;
   fn_entries = img_header->fn_num;
-  fn_names = img_header->fn_names;
+  fn_descs = img_header->fn_descs;
 
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
 				 * (fn_entries + var_entries));
@@ -1640,12 +1648,12 @@ GOMP_OFFLOAD_load_image (int ord, unsign
     {
       CUfunction function;
 
-      r = cuModuleGetFunction (&function, module, fn_names[i]);
+      r = cuModuleGetFunction (&function, module, fn_descs[i].fn);
       if (r != CUDA_SUCCESS)
 	GOMP_PLUGIN_fatal ("cuModuleGetFunction error: %s", cuda_error (r));
 
       targ_fns->fn = function;
-      targ_fns->name = (const char *) fn_names[i];
+      targ_fns->launch = &fn_descs[i];
 
       targ_tbl->start = (uintptr_t) targ_fns;
       targ_tbl->end = targ_tbl->start + 1;
@@ -1724,13 +1732,12 @@ void (*device_run) (int n, void *fn_ptr,
 
 void
 GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
-			       void **hostaddrs, void **devaddrs, size_t *sizes,
-			       unsigned short *kinds, int num_gangs,
-			       int num_workers, int vector_length, int async,
-			       void *targ_mem_desc)
+			       void **hostaddrs, void **devaddrs,
+			       size_t *sizes, unsigned short *kinds,
+			       int async, unsigned *dims, void *targ_mem_desc)
 {
-  nvptx_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs,
-	    num_workers, vector_length, async, targ_mem_desc);
+  nvptx_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
+	      async, dims, targ_mem_desc);
 }
 
 void
Index: libgomp/oacc-host.c
===================================================================
--- libgomp/oacc-host.c	(revision 227137)
+++ libgomp/oacc-host.c	(working copy)
@@ -137,10 +137,8 @@ host_openacc_exec (void (*fn) (void *),
 		   void **devaddrs __attribute__ ((unused)),
 		   size_t *sizes __attribute__ ((unused)),
 		   unsigned short *kinds __attribute__ ((unused)),
-		   int num_gangs __attribute__ ((unused)),
-		   int num_workers __attribute__ ((unused)),
-		   int vector_length __attribute__ ((unused)),
 		   int async __attribute__ ((unused)),
+		   unsigned *dims __attribute ((unused)),
 		   void *targ_mem_desc __attribute__ ((unused)))
 {
   fn (hostaddrs);
Index: gcc/config/nvptx/nvptx.c
===================================================================
--- gcc/config/nvptx/nvptx.c	(revision 227137)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -56,6 +56,8 @@
 #include "cfgrtl.h"
 #include "stor-layout.h"
 #include "builtins.h"
+#include "omp-low.h"
+#include "gomp-constants.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2064,9 +2066,51 @@ nvptx_vector_alignment (const_tree type)
 static void
 nvptx_record_offload_symbol (tree decl)
 {
-  fprintf (asm_out_file, "//:%s_MAP %s\n",
-	   TREE_CODE (decl) == VAR_DECL ? "VAR" : "FUNC",
-	   IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+  switch (TREE_CODE (decl))
+    {
+    case VAR_DECL:
+      fprintf (asm_out_file, "//:VAR_MAP \"%s\"\n",
+	       IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+      break;
+
+    case FUNCTION_DECL:
+      {
+	tree attr = get_oacc_fn_attrib (decl);
+	tree dims = NULL_TREE;
+	unsigned ix;
+
+	if (attr)
+	  dims = TREE_VALUE (attr);
+	fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
+		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+
+	for (ix = 0; ix != GOMP_DIM_MAX; ix++)
+	  {
+	    int size = 1;
+
+	    /* TODO: This check can go away once the dimension default
+	       machinery is merged to trunk.  */
+	    if (dims)
+	      {
+		tree dim = TREE_VALUE (dims);
+
+		if (dim)
+		  size = TREE_INT_CST_LOW (dim);
+
+		gcc_assert (!TREE_PURPOSE (dims));
+		dims = TREE_CHAIN (dims);
+	      }
+	    
+	    fprintf (asm_out_file, ", %#x", size);
+	  }
+	
+	fprintf (asm_out_file, "\n");
+      }
+      break;
+  
+    default:
+      gcc_unreachable ();
+    }
 }
 
 /* Implement TARGET_ASM_FILE_START.  Write the kinds of things ptxas expects
Index: gcc/config/nvptx/mkoffload.c
===================================================================
--- gcc/config/nvptx/mkoffload.c	(revision 227137)
+++ gcc/config/nvptx/mkoffload.c	(working copy)
@@ -842,6 +842,8 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  const char *comma;
+  id_map const *id;
 
   do
     tok = parse_file (tok);
@@ -853,21 +855,25 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
 
-  fprintf (out, "static const char *const var_mappings[] = {\n");
-  for (id_map *id = var_ids; id; id = id->next)
-    fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
-  fprintf (out, "};\n\n");
-  fprintf (out, "static const char *const func_mappings[] = {\n");
-  for (id_map *id = func_ids; id; id = id->next)
-    fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
-  fprintf (out, "};\n\n");
+  fprintf (out, "static const char *const var_mappings[] = {");
+  for (comma = "", id = var_ids; id; comma = ",", id = id->next)
+    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+  fprintf (out, "\n};\n\n");
+
+  fprintf (out, "static const struct nvptx_fn {\n"
+	   "  const char *name;\n"
+	   "  unsigned short dim[3];\n"
+	   "} func_mappings[] = {\n");
+  for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+  fprintf (out, "\n};\n\n");
 
   fprintf (out,
 	   "static const struct nvptx_tdata {\n"
 	   "  const char *ptx_src;\n"
 	   "  const char *const *var_names;\n"
 	   "  __SIZE_TYPE__ var_num;\n"
-	   "  const char *const *fn_names;\n"
+	   "  const struct nvptx_fn *fn_names;\n"
 	   "  __SIZE_TYPE__ fn_num;\n"
 	   "} target_data = {\n"
 	   "  ptx_code,\n"
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 227137)
+++ gcc/omp-low.c	(working copy)
@@ -8795,6 +8794,102 @@ expand_omp_atomic (struct omp_region *re
 }
 
 
+/* Encode an oacc launc argument.  This matches the GOMP_LAUNCH_PACK
+   macro on gomp-constants.h.  We do not check for overflow.  */
+
+static tree
+oacc_launch_pack (unsigned code, tree device, unsigned op)
+{
+  tree res;
+  
+  res = build_int_cst (unsigned_type_node, GOMP_LAUNCH_PACK (code, 0, op));
+  if (device)
+    {
+      device = fold_build2 (LSHIFT_EXPR, unsigned_type_node,
+			    device, build_int_cst (unsigned_type_node,
+						   GOMP_LAUNCH_DEVICE_SHIFT));
+      res = fold_build2 (BIT_IOR_EXPR, unsigned_type_node, res, device);
+    }
+  return res;
+}
+
+/* Look for compute grid dimension clauses and convert to an attribute
+   attached to FN.  This permits the target-side code to (a) massage
+   the dimensions, (b) emit that data and (c) optimize.  Non-constant
+   dimensions are pushed onto ARGS.
+
+   The attribute value is a TREE_LIST.  A set of dimensions is
+   represented as a list of INTEGER_CST.  Those that are runtime
+   expres are represented as an INTEGER_CST of zero.
+
+   TOOO. Normally the attribute will just contain a single such list.  If
+   however it contains a list of lists, this will represent the use of
+   device_type.  Each member of the outer list is an assoc list of
+   dimensions, keyed by the device type.  The first entry will be the
+   default.  Well, that's the plan.  */
+
+#define OACC_FN_ATTRIB "oacc function"
+
+/* Replace any existing oacc fn attribute with updated dimensions.  */
+
+void
+replace_oacc_fn_attrib (tree fn, tree dims)
+{
+  tree ident = get_identifier (OACC_FN_ATTRIB);
+  tree attribs = DECL_ATTRIBUTES (fn);
+
+  /* If we happen to be present as the first attrib, drop it.  */
+  if (attribs && TREE_PURPOSE (attribs) == ident)
+    attribs = TREE_CHAIN (attribs);
+  DECL_ATTRIBUTES (fn) = tree_cons (ident, dims, attribs);
+}
+
+static void
+set_oacc_fn_attrib (tree fn, tree clauses, vec<tree> *args)
+{
+  /* Must match GOMP_DIM ordering.  */
+  static const omp_clause_code ids[] = 
+    {OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_VECTOR_LENGTH};
+  unsigned ix;
+  tree dims[GOMP_DIM_MAX];
+  tree attr = NULL_TREE;
+  unsigned non_const = 0;
+
+  for (ix = GOMP_DIM_MAX; ix--;)
+    {
+      tree clause = find_omp_clause (clauses, ids[ix]);
+      tree dim = NULL_TREE;
+
+      if (clause)
+	dim = OMP_CLAUSE_EXPR (clause, ids[ix]);
+      dims[ix] = dim;
+      if (dim && TREE_CODE (dim) != INTEGER_CST)
+	{
+	  dim = integer_zero_node;
+	  non_const |= GOMP_DIM_MASK (ix);
+	}
+      attr = tree_cons (NULL_TREE, dim, attr);
+    }
+
+  replace_oacc_fn_attrib (fn, attr);
+
+  if (non_const)
+    {
+      /* Push a dynamic argument set.  */
+      args->safe_push (oacc_launch_pack (GOMP_LAUNCH_DIM,
+					 NULL_TREE, non_const));
+      for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
+	if (non_const & GOMP_DIM_MASK (ix))
+	  args->safe_push (dims[ix]);
+    }
+}
+
+tree
+get_oacc_fn_attrib (tree fn)
+{
+  return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -9150,6 +9245,7 @@ expand_omp_target (struct omp_region *re
     }
 
   gimple g;
+  bool tagging = false;
   /* The maximum number used by any start_ix, without varargs.  */
   auto_vec<tree, 11> args;
   args.quick_push (device);
@@ -9185,88 +9281,86 @@ expand_omp_target (struct omp_region *re
       break;
     case BUILT_IN_GOACC_PARALLEL:
       {
-	tree t_num_gangs, t_num_workers, t_vector_length;
-
-	/* Default values for num_gangs, num_workers, and vector_length.  */
-	t_num_gangs = t_num_workers = t_vector_length
-	  = fold_convert_loc (gimple_location (entry_stmt),
-			      integer_type_node, integer_one_node);
-	/* ..., but if present, use the value specified by the respective
-	   clause, making sure that are of the correct type.  */
-	c = find_omp_clause (clauses, OMP_CLAUSE_NUM_GANGS);
-	if (c)
-	  t_num_gangs = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					  integer_type_node,
-					  OMP_CLAUSE_NUM_GANGS_EXPR (c));
-	c = find_omp_clause (clauses, OMP_CLAUSE_NUM_WORKERS);
-	if (c)
-	  t_num_workers = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					    integer_type_node,
-					    OMP_CLAUSE_NUM_WORKERS_EXPR (c));
-	c = find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH);
-	if (c)
-	  t_vector_length = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					      integer_type_node,
-					      OMP_CLAUSE_VECTOR_LENGTH_EXPR (c));
-	args.quick_push (t_num_gangs);
-	args.quick_push (t_num_workers);
-	args.quick_push (t_vector_length);
+	set_oacc_fn_attrib (child_fn, clauses, &args);
+	tagging = true;
       }
       /* FALLTHRU */
     case BUILT_IN_GOACC_ENTER_EXIT_DATA:
     case BUILT_IN_GOACC_UPDATE:
       {
-	tree t_async;
-	int t_wait_idx;
+	tree t_async = NULL_TREE;
 
-	/* Default values for t_async.  */
-	t_async = fold_convert_loc (gimple_location (entry_stmt),
-				    integer_type_node,
-				    build_int_cst (integer_type_node,
-						   GOMP_ASYNC_SYNC));
-	/* ..., but if present, use the value specified by the respective
+	/* If present, use the value specified by the respective
 	   clause, making sure that is of the correct type.  */
 	c = find_omp_clause (clauses, OMP_CLAUSE_ASYNC);
 	if (c)
 	  t_async = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
 				      integer_type_node,
 				      OMP_CLAUSE_ASYNC_EXPR (c));
-
-	args.quick_push (t_async);
-	/* Save the index, and... */
-	t_wait_idx = args.length ();
-	/* ... push a default value.  */
-	args.quick_push (fold_convert_loc (gimple_location (entry_stmt),
-					   integer_type_node,
-					   integer_zero_node));
-	c = find_omp_clause (clauses, OMP_CLAUSE_WAIT);
-	if (c)
+	else if (!tagging)
+	  /* Default values for t_async.  */
+	  t_async = fold_convert_loc (gimple_location (entry_stmt),
+				      integer_type_node,
+				      build_int_cst (integer_type_node,
+						     GOMP_ASYNC_SYNC));
+	if (tagging && t_async)
 	  {
-	    int n = 0;
+	    unsigned HOST_WIDE_INT i_async;
 
-	    for (; c; c = OMP_CLAUSE_CHAIN (c))
+	    if (TREE_CODE (t_async) == INTEGER_CST)
 	      {
-		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
-		  {
-		    args.safe_push (fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-						      integer_type_node,
-						      OMP_CLAUSE_WAIT_EXPR (c)));
-		    n++;
-		  }
+		/* See if we can pack the async arg in to the tag's
+		   operand.  */
+		i_async = TREE_INT_CST_LOW (t_async);
+
+		if (i_async < GOMP_LAUNCH_OP_MAX)
+		  t_async = NULL_TREE;
 	      }
+	    if (t_async)
+	      i_async = GOMP_LAUNCH_OP_MAX;
+	    args.safe_push (oacc_launch_pack
+			    (GOMP_LAUNCH_ASYNC, NULL_TREE, i_async));
+	  }
+	if (t_async)
+	  args.safe_push (t_async);
 
-	    /* Now that we know the number, replace the default value.  */
-	    args.ordered_remove (t_wait_idx);
-	    args.quick_insert (t_wait_idx,
-			       fold_convert_loc (gimple_location (entry_stmt),
-						 integer_type_node,
-						 build_int_cst (integer_type_node, n)));
+	/* Save the argument index, and ... */
+	unsigned t_wait_idx = args.length ();
+	unsigned num_waits = 0;
+	c = find_omp_clause (clauses, OMP_CLAUSE_WAIT);
+	if (!tagging || c)
+	  /* ... push a placeholder.  */
+	  args.safe_push (integer_zero_node);
+
+	for (; c; c = OMP_CLAUSE_CHAIN (c))
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
+	    {
+	      args.safe_push (fold_convert_loc (OMP_CLAUSE_LOCATION (c),
+						integer_type_node,
+						OMP_CLAUSE_WAIT_EXPR (c)));
+	      num_waits++;
+	    }
+
+	if (!tagging || num_waits)
+	  {
+	    tree len;
+
+	    /* Now that we know the number, update the placeholder.  */
+	    if (tagging)
+	      len = oacc_launch_pack (GOMP_LAUNCH_WAIT, NULL_TREE, num_waits);
+	    else
+	      len = build_int_cst (integer_type_node, num_waits);
+	    len = fold_convert_loc (gimple_location (entry_stmt),
+				    unsigned_type_node, len);
+	    args[t_wait_idx] = len;
 	  }
       }
       break;
     default:
       gcc_unreachable ();
     }
+  if (tagging)
+    args.safe_push (oacc_launch_pack (GOMP_LAUNCH_END, NULL_TREE, 0));
 
   g = gimple_build_call_vec (builtin_decl_explicit (start_ix), args);
   gimple_set_location (g, gimple_location (entry_stmt));
Index: gcc/omp-low.h
===================================================================
--- gcc/omp-low.h	(revision 227137)
+++ gcc/omp-low.h	(working copy)
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern tree get_oacc_fn_attrib (tree);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
Index: gcc/builtin-types.def
===================================================================
--- gcc/builtin-types.def	(revision 227137)
+++ gcc/builtin-types.def	(working copy)
@@ -590,15 +590,14 @@ DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_STRIN
 DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_INT_INT_INT_INT_INT_VAR,
 			 BT_INT, BT_INT, BT_INT, BT_INT, BT_INT, BT_INT)
 
+DEF_FUNCTION_TYPE_VAR_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
+			 BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
+			 BT_PTR, BT_PTR, BT_PTR)
+
 DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 			 BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
 			 BT_PTR, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_11 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
-			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
-			  BT_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT, BT_INT,
-			  BT_INT, BT_INT)
-
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_VAR, BT_FN_VOID_VAR)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_FN_VOID_VAR_PTR_SIZE,
 		     BT_PTR, BT_PTR_FN_VOID_VAR, BT_PTR, BT_SIZE)
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 227137)
+++ gcc/tree.h	(working copy)
@@ -1,3 +1,4 @@
+
 /* Definitions for the ubiquitous 'tree' type for GNU compilers.
    Copyright (C) 1989-2015 Free Software Foundation, Inc.
 
@@ -1369,6 +1370,8 @@ extern void protected_set_expr_location
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE), 0)
 
 /* OpenACC clause expressions  */
+#define OMP_CLAUSE_EXPR(NODE, CLAUSE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, CLAUSE), 0)
 #define OMP_CLAUSE_GANG_EXPR(NODE) \
   OMP_CLAUSE_OPERAND ( \
     OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_GANG), 0)
Index: gcc/omp-builtins.def
===================================================================
--- gcc/omp-builtins.def	(revision 227137)
+++ gcc/omp-builtins.def	(working copy)
@@ -38,8 +38,8 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_E
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel",
-		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel_keyed",
+		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
 		   ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
Index: gcc/lto/lto-lang.c
===================================================================
--- gcc/lto/lto-lang.c	(revision 227137)
+++ gcc/lto/lto-lang.c	(working copy)
@@ -160,10 +160,10 @@ enum lto_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG6) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -182,8 +182,8 @@ enum lto_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -668,13 +668,12 @@ lto_define_builtins (tree va_list_ref_ty
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6)	\
+  def_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
   def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11)	\
-  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
-	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -696,8 +695,8 @@ lto_define_builtins (tree va_list_ref_ty
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 227137)
+++ gcc/c-family/c-common.c	(working copy)
@@ -5545,10 +5545,10 @@ enum c_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -5567,8 +5567,8 @@ enum c_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -5661,13 +5661,12 @@ c_define_builtins (tree va_list_ref_type
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) \
+  def_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
   def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
-  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,      \
-	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -5689,8 +5688,8 @@ c_define_builtins (tree va_list_ref_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/fortran/f95-lang.c
===================================================================
--- gcc/fortran/f95-lang.c	(revision 227137)
+++ gcc/fortran/f95-lang.c	(working copy)
@@ -635,10 +635,10 @@ gfc_init_builtin_functions (void)
 			    ARG6, ARG7, ARG8) NAME,
 #define DEF_FUNCTION_TYPE_VAR_0(NAME, RETURN) NAME,
 #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -653,8 +653,8 @@ gfc_init_builtin_functions (void)
 #undef DEF_FUNCTION_TYPE_8
 #undef DEF_FUNCTION_TYPE_VAR_0
 #undef DEF_FUNCTION_TYPE_VAR_2
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
     BT_LAST
   };
@@ -1096,8 +1096,8 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG1],     	\
 					builtin_types[(int) ARG2],     	\
 					NULL_TREE);
-#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7)				\
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6)	\
   builtin_types[(int) ENUM]						\
     = build_varargs_function_type_list (builtin_types[(int) RETURN],   	\
 					builtin_types[(int) ARG1],     	\
@@ -1106,10 +1106,9 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG4],	\
 					builtin_types[(int) ARG5],	\
 					builtin_types[(int) ARG6],	\
-					builtin_types[(int) ARG7],	\
 					NULL_TREE);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11)	\
+#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7)				\
   builtin_types[(int) ENUM]						\
     = build_varargs_function_type_list (builtin_types[(int) RETURN],   	\
 					builtin_types[(int) ARG1],     	\
@@ -1119,10 +1118,6 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG5],	\
 					builtin_types[(int) ARG6],	\
 					builtin_types[(int) ARG7],	\
-					builtin_types[(int) ARG8],	\
-					builtin_types[(int) ARG9],	\
-					builtin_types[(int) ARG10],	\
-					builtin_types[(int) ARG11],	\
 					NULL_TREE);
 #define DEF_POINTER_TYPE(ENUM, TYPE)			\
   builtin_types[(int) ENUM]				\
@@ -1140,8 +1135,8 @@ gfc_init_builtin_functions (void)
 #undef DEF_FUNCTION_TYPE_8
 #undef DEF_FUNCTION_TYPE_VAR_0
 #undef DEF_FUNCTION_TYPE_VAR_2
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/fortran/types.def
===================================================================
--- gcc/fortran/types.def	(revision 227137)
+++ gcc/fortran/types.def	(working copy)
@@ -219,7 +219,6 @@ DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_
 			 BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
 			 BT_PTR, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_11 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+DEF_FUNCTION_TYPE_VAR_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
 			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
-			  BT_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT, BT_INT,
-			  BT_INT, BT_INT)
+			  BT_PTR, BT_PTR, BT_PTR)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-08-25 13:40 Openacc launch API Nathan Sidwell
@ 2015-08-28 17:30 ` Nathan Sidwell
  2015-08-28 18:07   ` Jakub Jelinek
  2015-09-07 13:09 ` Nathan Sidwell
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Nathan Sidwell @ 2015-08-28 17:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

On 08/25/15 09:29, Nathan Sidwell wrote:

> I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide
> a forwarding function. However, as the mkoffload data is incompatible, this is
> probably overkill.  I've had to increment the (just committed) version number to
> detect the change in data representation.  So any attempt to run an old binary
> with a new libgomp will fail at the loading point.  We could simply keep the
> same 'GOACC_parallel' name and not need any new symbols.  WDYT?

I'm coming to the conclusion that just keeping the original 'GOACC_parallel' 
name is the way to go.  As I said above, we cannot support backwards 
compatibility on the offload data, so the only remaining case is someone 
building an openacc program for running on the host.  As I said at the cauldron, 
I think the set of users that cared enough about openacc to try gcc 5 but don't 
care enough to recompile their programs  for gcc 6 is the empty set.

Jakub?

nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-08-28 17:30 ` Nathan Sidwell
@ 2015-08-28 18:07   ` Jakub Jelinek
  2015-08-28 19:50     ` Nathan Sidwell
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Jelinek @ 2015-08-28 18:07 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: GCC Patches

On Fri, Aug 28, 2015 at 01:29:51PM -0400, Nathan Sidwell wrote:
> On 08/25/15 09:29, Nathan Sidwell wrote:
> 
> >I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide
> >a forwarding function. However, as the mkoffload data is incompatible, this is
> >probably overkill.  I've had to increment the (just committed) version number to
> >detect the change in data representation.  So any attempt to run an old binary
> >with a new libgomp will fail at the loading point.  We could simply keep the
> >same 'GOACC_parallel' name and not need any new symbols.  WDYT?
> 
> I'm coming to the conclusion that just keeping the original 'GOACC_parallel'
> name is the way to go.  As I said above, we cannot support backwards
> compatibility on the offload data, so the only remaining case is someone
> building an openacc program for running on the host.  As I said at the
> cauldron, I think the set of users that cared enough about openacc to try
> gcc 5 but don't care enough to recompile their programs  for gcc 6 is the
> empty set.

It is ok if for the GCC 5 compiled programs we always fallback to host,
but IMNSHO we really should keep at least that host fallback working.

We'll have new names for the OpenMP target entry points too.

	Jakub

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-08-28 18:07   ` Jakub Jelinek
@ 2015-08-28 19:50     ` Nathan Sidwell
  0 siblings, 0 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-08-28 19:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

On 08/28/15 13:36, Jakub Jelinek wrote:
> On Fri, Aug 28, 2015 at 01:29:51PM -0400, Nathan Sidwell wrote:
>> On 08/25/15 09:29, Nathan Sidwell wrote:
>>
>>> I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide
>>> a forwarding function. However, as the mkoffload data is incompatible, this is
>>> probably overkill.  I've had to increment the (just committed) version number to
>>> detect the change in data representation.  So any attempt to run an old binary
>>> with a new libgomp will fail at the loading point.  We could simply keep the
>>> same 'GOACC_parallel' name and not need any new symbols.  WDYT?
>>
>> I'm coming to the conclusion that just keeping the original 'GOACC_parallel'
>> name is the way to go.  As I said above, we cannot support backwards
>> compatibility on the offload data, so the only remaining case is someone
>> building an openacc program for running on the host.  As I said at the
>> cauldron, I think the set of users that cared enough about openacc to try
>> gcc 5 but don't care enough to recompile their programs  for gcc 6 is the
>> empty set.
>
> It is ok if for the GCC 5 compiled programs we always fallback to host,
> but IMNSHO we really should keep at least that host fallback working.

Is that approval for the patch as I posted it?

nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-08-25 13:40 Openacc launch API Nathan Sidwell
  2015-08-28 17:30 ` Nathan Sidwell
@ 2015-09-07 13:09 ` Nathan Sidwell
  2015-09-11 15:59   ` Nathan Sidwell
  2015-09-17  9:46 ` Bernd Schmidt
  2015-09-30 12:42 ` Matthias Klose
  3 siblings, 1 reply; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-07 13:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

On 08/25/15 09:29, Nathan Sidwell wrote:
> Jakub,
>
> This patch changes the launch API for openacc parallels.  The current scheme
> passes the launch dimensions as 3 separate parameters to the GOACC_parallel
> function.  This is problematic for a couple of reasons:
>
> 1) these must be validated in the host compiler
>
> 2) they provide no extension to support a variety of different offload devices
> with different geometry requirements.
>
> This patch changes things so that the function tables emitted by (ptx)
> mkoffloads includes the geometry triplet for each function.  This allows them to
> be validated and/or manipulated in the offload compiler.  However, this only
> works for compile-time known dimensions -- which is a common case.  To deal with
> runtime-computed dimensions we have to retain the host-side compiler's
> calculation and pass that into the GOACC_parallel function.  We change
> GOACC_parallel to take a variadic list of keyed operands ending with a sentinel
> marker.  These keyed operands have a slot for expansion to support multiple
> different offload devices.
>
> We also extend the functionality of the 'oacc function' internal attribute.
> Rather than being a simple marker, it now has a value, which is a TREE_LIST of
> the geometry required.  The geometry is held as INTEGER_CSTs on the TREE_VALUE
> slots.  Runtime-calculated values are represented by an INTEGER_CST of zero.
> We'll also use this representation for  'routines', where the TREE_PURPOSE slot
> will be used to indicate the levels at which a routine might spawn a partitioned
> loop.  Again, to allow future expansion supporting a number of different offload
> devices, this can become a list-of-lists, keyed by and offload device
> identifier.  The offload  compiler can manipulate this data, and a later patch
> will do this within a new oacc-xform pass.
>
> I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide
> a forwarding function. However, as the mkoffload data is incompatible, this is
> probably overkill.  I've had to increment the (just committed) version number to
> detect the change in data representation.  So any attempt to run an old binary
> with a new libgomp will fail at the loading point.  We could simply keep the
> same 'GOACC_parallel' name and not need any new symbols.  WDYT?

Ping?

https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01498.html

nathan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-07 13:09 ` Nathan Sidwell
@ 2015-09-11 15:59   ` Nathan Sidwell
  2015-09-16 20:59     ` Nathan Sidwell
  0 siblings, 1 reply; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-11 15:59 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Ping?

https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01498.html



On 09/07/15 08:48, Nathan Sidwell wrote:
> On 08/25/15 09:29, Nathan Sidwell wrote:
>> Jakub,
>>
>> This patch changes the launch API for openacc parallels.  The current scheme
>> passes the launch dimensions as 3 separate parameters to the GOACC_parallel
>> function.  This is problematic for a couple of reasons:
>>
>> 1) these must be validated in the host compiler
>>
>> 2) they provide no extension to support a variety of different offload devices
>> with different geometry requirements.
>>
>> This patch changes things so that the function tables emitted by (ptx)
>> mkoffloads includes the geometry triplet for each function.  This allows them to
>> be validated and/or manipulated in the offload compiler.  However, this only
>> works for compile-time known dimensions -- which is a common case.  To deal with
>> runtime-computed dimensions we have to retain the host-side compiler's
>> calculation and pass that into the GOACC_parallel function.  We change
>> GOACC_parallel to take a variadic list of keyed operands ending with a sentinel
>> marker.  These keyed operands have a slot for expansion to support multiple
>> different offload devices.
>>
>> We also extend the functionality of the 'oacc function' internal attribute.
>> Rather than being a simple marker, it now has a value, which is a TREE_LIST of
>> the geometry required.  The geometry is held as INTEGER_CSTs on the TREE_VALUE
>> slots.  Runtime-calculated values are represented by an INTEGER_CST of zero.
>> We'll also use this representation for  'routines', where the TREE_PURPOSE slot
>> will be used to indicate the levels at which a routine might spawn a partitioned
>> loop.  Again, to allow future expansion supporting a number of different offload
>> devices, this can become a list-of-lists, keyed by and offload device
>> identifier.  The offload  compiler can manipulate this data, and a later patch
>> will do this within a new oacc-xform pass.
>>
>> I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide
>> a forwarding function. However, as the mkoffload data is incompatible, this is
>> probably overkill.  I've had to increment the (just committed) version number to
>> detect the change in data representation.  So any attempt to run an old binary
>> with a new libgomp will fail at the loading point.  We could simply keep the
>> same 'GOACC_parallel' name and not need any new symbols.  WDYT?
>
> Ping?
>
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01498.html
>
> nathan
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-11 15:59   ` Nathan Sidwell
@ 2015-09-16 20:59     ` Nathan Sidwell
  0 siblings, 0 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-16 20:59 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Ping?

On 09/11/15 11:50, Nathan Sidwell wrote:
> Ping?
>
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01498.html
>
>
>
> On 09/07/15 08:48, Nathan Sidwell wrote:
>> On 08/25/15 09:29, Nathan Sidwell wrote:
>>> Jakub,
>>>
>>> This patch changes the launch API for openacc parallels.  The current scheme
>>> passes the launch dimensions as 3 separate parameters to the GOACC_parallel
>>> function.  This is problematic for a couple of reasons:
>>>
>>> 1) these must be validated in the host compiler
>>>
>>> 2) they provide no extension to support a variety of different offload devices
>>> with different geometry requirements.
>>>
>>> This patch changes things so that the function tables emitted by (ptx)
>>> mkoffloads includes the geometry triplet for each function.  This allows them to
>>> be validated and/or manipulated in the offload compiler.  However, this only
>>> works for compile-time known dimensions -- which is a common case.  To deal with
>>> runtime-computed dimensions we have to retain the host-side compiler's
>>> calculation and pass that into the GOACC_parallel function.  We change
>>> GOACC_parallel to take a variadic list of keyed operands ending with a sentinel
>>> marker.  These keyed operands have a slot for expansion to support multiple
>>> different offload devices.
>>>
>>> We also extend the functionality of the 'oacc function' internal attribute.
>>> Rather than being a simple marker, it now has a value, which is a TREE_LIST of
>>> the geometry required.  The geometry is held as INTEGER_CSTs on the TREE_VALUE
>>> slots.  Runtime-calculated values are represented by an INTEGER_CST of zero.
>>> We'll also use this representation for  'routines', where the TREE_PURPOSE slot
>>> will be used to indicate the levels at which a routine might spawn a partitioned
>>> loop.  Again, to allow future expansion supporting a number of different offload
>>> devices, this can become a list-of-lists, keyed by and offload device
>>> identifier.  The offload  compiler can manipulate this data, and a later patch
>>> will do this within a new oacc-xform pass.
>>>
>>> I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide
>>> a forwarding function. However, as the mkoffload data is incompatible, this is
>>> probably overkill.  I've had to increment the (just committed) version number to
>>> detect the change in data representation.  So any attempt to run an old binary
>>> with a new libgomp will fail at the loading point.  We could simply keep the
>>> same 'GOACC_parallel' name and not need any new symbols.  WDYT?
>>
>> Ping?
>>
>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01498.html
>>
>> nathan
>>
>>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-08-25 13:40 Openacc launch API Nathan Sidwell
  2015-08-28 17:30 ` Nathan Sidwell
  2015-09-07 13:09 ` Nathan Sidwell
@ 2015-09-17  9:46 ` Bernd Schmidt
  2015-09-17 13:24   ` Nathan Sidwell
  2015-09-17 14:43   ` Nathan Sidwell
  2015-09-30 12:42 ` Matthias Klose
  3 siblings, 2 replies; 34+ messages in thread
From: Bernd Schmidt @ 2015-09-17  9:46 UTC (permalink / raw)
  To: Nathan Sidwell, Jakub Jelinek; +Cc: GCC Patches

Since Jakub appears to be busy, I'll give my 2 cents.

On 08/25/2015 03:29 PM, Nathan Sidwell wrote:
> I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and
> provide a forwarding function. However, as the mkoffload data is
> incompatible, this is probably overkill.  I've had to increment the
> (just committed) version number to detect the change in data
> representation.  So any attempt to run an old binary with a new libgomp
> will fail at the loading point.

Fail how? Jakub has requested that it works but falls back to 
unaccelerated execution, can you confirm this is what you expect to 
happen with this patch?

> +/* Varadic launch arguments.  */
> +#define GOMP_LAUNCH_END 	0  /* End of args, no dev or op */
> +#define GOMP_LAUNCH_DIM		1  /* Launch dimensions, op = mask */
> +#define GOMP_LAUNCH_ASYNC	2  /* Async, op = cst val if not MAX  */
> +#define GOMP_LAUNCH_WAIT	3  /* Waits, op = num waits.  */
> +#define GOMP_LAUNCH_CODE_SHIFT	28
> +#define GOMP_LAUNCH_DEVICE_SHIFT 16
> +#define GOMP_LAUNCH_OP_SHIFT 0
> +#define GOMP_LAUNCH_PACK(CODE,DEVICE,OP)	\
> +  (((CODE) << GOMP_LAUNCH_CODE_SHIFT)		\
> +   | ((DEVICE) << GOMP_LAUNCH_DEVICE_SHIFT)	\
> +   | ((OP) << GOMP_LAUNCH_OP_SHIFT))
> +#define GOMP_LAUNCH_CODE(X) (((X) >> GOMP_LAUNCH_CODE_SHIFT) & 0xf)
> +#define GOMP_LAUNCH_DEVICE(X) (((X) >> GOMP_LAUNCH_DEVICE_SHIFT) & 0xfff)
> +#define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0xffff)
> +#define GOMP_LAUNCH_OP_MAX 0xffff

I probably would have used something simpler, like a code/device/op 
argument triplet, but I guess this ok.

> -  if (num_waits)
> +  va_start (ap, kinds);
> +  /* TODO: This will need amending when device_type is implemented.  */

I'd expect that this will check whether the device type in the argument 
is either zero (or whatever indicates all devices) or matches the 
current device. Is that what you intend?

> +  while (GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0)
> +	 != (tag = va_arg (ap, unsigned)))

That's a somewhat non-idiomatic way to write this, with the constant 
first and not obviously a constant. I'd initialize a variable with the 
constant before the loop.

> +      assert (!GOMP_LAUNCH_DEVICE (tag));

Uh, that seems unfriendly, and not exactly forwards compatible. Can that 
fail a bit more gracefully? (Alternatively, implement the device_type 
stuff now so that we don't have TODOs in the code and don't have to 
worry about compatibility issues.)

> +  if (num_waits > 8)
> +    gomp_fatal ("Too many waits for legacy interface");

How did you arrive at this number?

>
> +GOACC_2.0,1 {
> +  global:
> +	GOACC_parallel_keyed;
> +} GOACC_2.0;

Did you mean to use a comma?

> +  if (dims[GOMP_DIM_GANG] != 1)
> +    GOMP_PLUGIN_fatal ("non-unity num_gangs (%d) not supported",
> +		       dims[GOMP_DIM_GANG]);
> +  if (dims[GOMP_DIM_WORKER] != 1)
> +    GOMP_PLUGIN_fatal ("non-unity num_workers (%d) not supported",
> +		       dims[GOMP_DIM_WORKER]);

I see that this is just moved here (which is good), but is this still a 
limitation? Or is that on trunk only?

> +  for (comma = "", id = var_ids; id; comma = ",", id = id->next)
> +    fprintf (out, "%s\n\t%s", comma, id->ptx_name);

The comma trick is new to me, I'll have to remember this one.

> +static void
> +set_oacc_fn_attrib (tree fn, tree clauses, vec<tree> *args)
> +tree
> +get_oacc_fn_attrib (tree fn)

These need function comments.

> +{
> +  /* Must match GOMP_DIM ordering.  */
> +  static const omp_clause_code ids[] =
> +    {OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_VECTOR_LENGTH};

Formatting. No = at the end of a line, and whitespace around braces.

> @@ -9150,6 +9245,7 @@ expand_omp_target (struct omp_region *re
>       }
>
>     gimple g;
> +  bool tagging = false;
>     /* The maximum number used by any start_ix, without varargs.  */

That looks misindented, but may be an email client thing.

> +	else if (!tagging)

Oh... so tagging controls two different methods for constructing 
argument lists, one for GOACC_parallel and the other for whatever OMP 
uses? That's a bit unfortunate, I'll need to think about it for a bit or 
defer to Jakub.

Looks reasonable otherwise.


Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-17  9:46 ` Bernd Schmidt
@ 2015-09-17 13:24   ` Nathan Sidwell
  2015-09-17 14:43   ` Nathan Sidwell
  1 sibling, 0 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-17 13:24 UTC (permalink / raw)
  To: Bernd Schmidt, Jakub Jelinek; +Cc: GCC Patches

On 09/17/15 05:36, Bernd Schmidt wrote:

> Fail how? Jakub has requested that it works but falls back to unaccelerated
> execution, can you confirm this is what you expect to happen with this patch?

Yes, that is the failure mode.

>> -  if (num_waits)
>> +  va_start (ap, kinds);
>> +  /* TODO: This will need amending when device_type is implemented.  */
>
> I'd expect that this will check whether the device type in the argument is
> either zero (or whatever indicates all devices) or matches the current device.
> Is that what you intend?

correct.

>> +  while (GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0)
>> +     != (tag = va_arg (ap, unsigned)))
>
> That's a somewhat non-idiomatic way to write this, with the constant first and
> not obviously a constant. I'd initialize a variable with the constant before the
> loop.

Hm, yeah, that is a little unpleasant.  The alternative is IIRC a mid-loop break.

(If only this was C++  then we could write:

   while (int tag = va_arg (ap, unsigned)) { ... }

relying on GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0) being zero.  Maybe the 
C-ified version of that would be clearer?)

> +      assert (!GOMP_LAUNCH_DEVICE (tag));
>
> Uh, that seems unfriendly, and not exactly forwards compatible. Can that fail a

I suppose.  I was thinking of this as an internal interface from the compiler, 
but I guess down the road some one  could try  running a device_type implemented 
binary from a future compiler with an old libgomp.

>> +  if (num_waits > 8)
>> +    gomp_fatal ("Too many waits for legacy interface");
>
> How did you arrive at this number?

The voice in my head.  I've only seen code that had up to 2 waits, so I figured 
8 was way plenty.

>> +GOACC_2.0,1 {
>> +  global:
>> +    GOACC_parallel_keyed;
>> +} GOACC_2.0;
>
> Did you mean to use a comma?

Probably.

>
>> +  if (dims[GOMP_DIM_GANG] != 1)
>> +    GOMP_PLUGIN_fatal ("non-unity num_gangs (%d) not supported",
>> +               dims[GOMP_DIM_GANG]);
>> +  if (dims[GOMP_DIM_WORKER] != 1)
>> +    GOMP_PLUGIN_fatal ("non-unity num_workers (%d) not supported",
>> +               dims[GOMP_DIM_WORKER]);
>
> I see that this is just moved here (which is good), but is this still a
> limitation? Or is that on trunk only?

Trunk only.  It'll go away shortly when more patches get merged.


>> +static void
>> +set_oacc_fn_attrib (tree fn, tree clauses, vec<tree> *args)
>> +tree
>> +get_oacc_fn_attrib (tree fn)
>
> These need function comments.

oops.

>> +  /* Must match GOMP_DIM ordering.  */
>> +  static const omp_clause_code ids[] =
>> +    {OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_VECTOR_LENGTH};
>
> Formatting. No = at the end of a line, and whitespace around braces.

oh, I thought intialization placed the = where I had -- didn't  know that nor 
the space-brace rule.

>
>> @@ -9150,6 +9245,7 @@ expand_omp_target (struct omp_region *re
>>       }
>>
>>     gimple g;
>> +  bool tagging = false;
>>     /* The maximum number used by any start_ix, without varargs.  */
>
> That looks misindented, but may be an email client thing.

It does, doesn't it.  Appears to be email artifact or something.

>
>> +    else if (!tagging)
>
> Oh... so tagging controls two different methods for constructing argument lists,
> one for GOACC_parallel and the other for whatever OMP uses? That's a bit
> unfortunate, I'll need to think about it for a bit or defer to Jakub.

It's the (new)  difference between how the following 3 OpenACC builtins handle 
asyn & wait args.
    case BUILT_IN_GOACC_PARALLEL:
       {
	set_oacc_fn_attrib (child_fn, clauses, &args);
	tagging = true;
       }
       /* FALLTHRU */
     case BUILT_IN_GOACC_ENTER_EXIT_DATA:
     case BUILT_IN_GOACC_UPDATE:

All 3 pass info about memory copies etc, async and optional waits.  For  E_E_D 
and UPDATE the async is always passed (with a special value for 'synchronous'), 
followed by a count and then variadic wait ints.

An alternarive I suppse would be to break out the meminfo arg pushing to a 
helper function and have 2 separate code paths, or something like that.

> Looks reasonable otherwise.

thanks for your review.  I'll repost shortly

nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-17  9:46 ` Bernd Schmidt
  2015-09-17 13:24   ` Nathan Sidwell
@ 2015-09-17 14:43   ` Nathan Sidwell
  2015-09-18  9:13     ` Bernd Schmidt
  2015-09-21 16:30     ` Openacc launch API Nathan Sidwell
  1 sibling, 2 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-17 14:43 UTC (permalink / raw)
  To: Bernd Schmidt, Jakub Jelinek; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

Updated patch addressing your points.  Some  further comments though ...

>> +  while (GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0)
>> +     != (tag = va_arg (ap, unsigned)))
>
> That's a somewhat non-idiomatic way to write this, with the constant first and
> not obviously a constant. I'd initialize a variable with the constant before the
> loop.

I went with
   while ((tag = va_arg (...)) != 0) ...

and killed GOMP_LAUNCH_END throughout, using explicit '0'.

>> +      assert (!GOMP_LAUNCH_DEVICE (tag));
>
> Uh, that seems unfriendly, and not exactly forwards compatible. Can that fail a
> bit more gracefully? (Alternatively, implement the device_type stuff now so that
> we don't have TODOs in the code and don't have to worry about compatibility
> issues.)

Added call to gomp_fatal, indicating libgomp is out of date. Also added a 
default to the switch following with the same effect.  The trouble  with 
implementing handling of device_type here now, is difficulty in testing its 
correctness.  If it were  buggy we'd be in a worse position than not having it.

>> +GOACC_2.0,1 {
>> +  global:
>> +    GOACC_parallel_keyed;
>> +} GOACC_2.0;
>
> Did you mean to use a comma?

I misunderstood your comment as 'did you mean to use a comma where you used 
something else', not 'is that comma a typo?'  well spotted!

>> +    else if (!tagging)
>
> Oh... so tagging controls two different methods for constructing argument lists,
> one for GOACC_parallel and the other for whatever OMP uses? That's a bit
> unfortunate, I'll need to think about it for a bit or defer to Jakub.

My earlier description was lacking.  The memory arguments have already been 
pushed before that switch.  This is just dealing with async & wait args.  I 
found it easier to modify the existing code path and have a tagging flag, rather 
than duplicate it.

nathan

[-- Attachment #2: trunk-launch-0917.patch --]
[-- Type: text/x-patch, Size: 43875 bytes --]

o2015-09-17  Nathan Sidwell  <nathan@codesourcery.com>

	inlude/
	* gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment.
	(GOMP_DIM_GANG, GOMP_DIM_WORKER, GOMP_DIM_VECTOR, GOMP_DIM_MAX,
	GOMP_DIM_MASK): New.
	(GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC, GOMP_LAUNCH_WAIT): New.
	(GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUNCH_DEVICE_SHIFT,
	GOMP_LAUNCH_OP_SHIFT): New.
	(GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE,
	GOMP_LAUNCH_OP): New.
	(GOMP_LAUNCH_OP_MAX): New.

	libgomp/
	* libgomp.h (acc_dispatch_t): Replace separate geometry args with
	array.
	* libgomp.map (GOACC_parallel_keyed): New.
	* oacc-parallel.c (goacc_wait): Take pointer to va_list.  Adjust
	all callers.
	(GOACC_parallel_keyed): New interface.  Lose geometry arguments
	and take keyed varargs list.  Adjust call to exec_func.
	(GOACC_parallel): Forward to GACC_parallel_keyed.
	* libgomp_g.h (GOACC_parallel): Remove.
	(GOACC_parallel_keyed): Declare.
	* plugin/plugin-nvptx.c (struct targ_fn_launch): New struct.
	(stuct targ_gn_descriptor): Replace name field with launch field.
	(nvptx_exec): Lose separate geometry args, take array.  Process
	dynamic dimensions and adjust.
	(struct nvptx_tdata): Replace fn_names field with fn_descs.
	(GOMP_OFFLOAD_load_image): Adjust for change in function table
	data.
	(GOMP_OFFLOAD_openacc_parallel): Adjust for change in dimension
	passing.
	* oacc-host.c (host_openacc_exec): Adjust for change in dimension
	passing.

	gcc/
	* config/nvptx/nvptx.c: Include omp-low.h and gomp-constants.h.
	(nvptx_record_offload_symbol): Record function execution geometry.
	* config/nvptx/mkoffload.c (process): Include launch geometry in
	function data.
	* omp-low.c (oacc_launch_pack): New.
	(replace_oacc_fn_attrib): New.
	(set_oacc_fn_attrib): New.
	(get_oacc_fn_attrib): New.
	(expand_omp_target): Create keyed varargs for GOACC_parallel call
	generation.
	* omp-low.h (get_oacc_fn_attrib): Declare.
	* builtin-types.def (DEF_FUNCTION_TyPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* tree.h (OMP_CLAUSE_EXPR): New.
	* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Change target fn name.

	gcc/lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/c-family/
	* c-common.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* types.def (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

Index: include/gomp-constants.h
===================================================================
--- include/gomp-constants.h	(revision 227862)
+++ include/gomp-constants.h	(working copy)
@@ -115,11 +115,33 @@ enum gomp_map_kind
 
 /* Versions of libgomp and device-specific plugins.  */
 #define GOMP_VERSION	0
-#define GOMP_VERSION_NVIDIA_PTX 0
+#define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0xffff)
 #define GOMP_VERSION_DEV(PACK) ((PACK) & 0xffff)
 
+#define GOMP_DIM_GANG	0
+#define GOMP_DIM_WORKER	1
+#define GOMP_DIM_VECTOR	2
+#define GOMP_DIM_MAX	3
+#define GOMP_DIM_MASK(X) (1u << (X))
+
+/* Varadic launch arguments.  End of list is marked by a zero.  */
+#define GOMP_LAUNCH_DIM		1  /* Launch dimensions, op = mask */
+#define GOMP_LAUNCH_ASYNC	2  /* Async, op = cst val if not MAX  */
+#define GOMP_LAUNCH_WAIT	3  /* Waits, op = num waits.  */
+#define GOMP_LAUNCH_CODE_SHIFT	28
+#define GOMP_LAUNCH_DEVICE_SHIFT 16
+#define GOMP_LAUNCH_OP_SHIFT 0
+#define GOMP_LAUNCH_PACK(CODE,DEVICE,OP)	\
+  (((CODE) << GOMP_LAUNCH_CODE_SHIFT)		\
+   | ((DEVICE) << GOMP_LAUNCH_DEVICE_SHIFT)	\
+   | ((OP) << GOMP_LAUNCH_OP_SHIFT))
+#define GOMP_LAUNCH_CODE(X) (((X) >> GOMP_LAUNCH_CODE_SHIFT) & 0xf)
+#define GOMP_LAUNCH_DEVICE(X) (((X) >> GOMP_LAUNCH_DEVICE_SHIFT) & 0xfff)
+#define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0xffff)
+#define GOMP_LAUNCH_OP_MAX 0xffff
+
 #endif
Index: libgomp/libgomp.h
===================================================================
--- libgomp/libgomp.h	(revision 227862)
+++ libgomp/libgomp.h	(working copy)
@@ -695,7 +695,7 @@ typedef struct acc_dispatch_t
 
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
-		     unsigned short *, int, int, int, int, void *);
+		     unsigned short *, int, unsigned *, void *);
 
   /* Async cleanup callback registration.  */
   void (*register_async_cleanup_func) (void *);
Index: libgomp/oacc-parallel.c
===================================================================
--- libgomp/oacc-parallel.c	(revision 227862)
+++ libgomp/oacc-parallel.c	(working copy)
@@ -49,14 +49,18 @@ find_pset (int pos, size_t mapnum, unsig
   return kind == GOMP_MAP_TO_PSET;
 }
 
-static void goacc_wait (int async, int num_waits, va_list ap);
+static void goacc_wait (int async, int num_waits, va_list *ap);
+
+
+/* Launch a possibly offloaded function on DEVICE.  FN is the host fn
+   address.  MAPNUM, HOSTADDRS, SIZES & KINDS  describe the memory
+   blocks to be copied to/from the device.  Varadic arguments are
+   keyed optional parameters terminated with a zero.  */
 
 void
-GOACC_parallel (int device, void (*fn) (void *),
-		size_t mapnum, void **hostaddrs, size_t *sizes,
-		unsigned short *kinds,
-		int num_gangs, int num_workers, int vector_length,
-		int async, int num_waits, ...)
+GOACC_parallel_keyed (int device, void (*fn) (void *),
+		      size_t mapnum, void **hostaddrs, size_t *sizes,
+		      unsigned short *kinds, ...)
 {
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
   va_list ap;
@@ -68,22 +72,16 @@ GOACC_parallel (int device, void (*fn) (
   struct splay_tree_key_s k;
   splay_tree_key tgt_fn_key;
   void (*tgt_fn);
-
-  if (num_gangs != 1)
-    gomp_fatal ("num_gangs (%d) different from one is not yet supported",
-		num_gangs);
-  if (num_workers != 1)
-    gomp_fatal ("num_workers (%d) different from one is not yet supported",
-		num_workers);
+  int async = GOMP_ASYNC_SYNC;
+  unsigned dims[GOMP_DIM_MAX];
+  unsigned tag;
 
 #ifdef HAVE_INTTYPES_H
-  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p, "
-		 "async = %d\n",
-	      __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds, async);
+  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p\n",
+	      __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds);
 #else
-  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
-	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds,
-	      async);
+  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds);
 #endif
   goacc_lazy_initialize ();
 
@@ -105,12 +103,51 @@ GOACC_parallel (int device, void (*fn) (
       return;
     }
 
-  if (num_waits)
-    {
-      va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
-      va_end (ap);
+  va_start (ap, kinds);
+  /* TODO: This will need amending when device_type is implemented.  */
+  while ((tag = va_arg (ap, unsigned)) != 0)
+    {
+      if (GOMP_LAUNCH_DEVICE (tag))
+	gomp_fatal ("device_type '%d' offload parameters, libgomp is too old",
+		    GOMP_LAUNCH_DEVICE (tag));
+
+      switch (GOMP_LAUNCH_CODE (tag))
+	{
+	case GOMP_LAUNCH_DIM:
+	  {
+	    unsigned mask = GOMP_LAUNCH_OP (tag);
+
+	    for (i = 0; i != GOMP_DIM_MAX; i++)
+	      if (mask & GOMP_DIM_MASK (i))
+		dims[i] = va_arg (ap, unsigned);
+	  }
+	  break;
+
+	case GOMP_LAUNCH_ASYNC:
+	  {
+	    /* Small constant values are encoded in the operand.  */
+	    async = GOMP_LAUNCH_OP (tag);
+
+	    if (async == GOMP_LAUNCH_OP_MAX)
+	      async = va_arg (ap, unsigned);
+	    break;
+	  }
+
+	case GOMP_LAUNCH_WAIT:
+	  {
+	    unsigned num_waits = GOMP_LAUNCH_OP (tag);
+
+	    if (num_waits)
+	      goacc_wait (async, num_waits, &ap);
+	    break;
+	  }
+
+	default:
+	  gomp_fatal ("unrecognized offload code '%d',"
+		      " libgomp is too old", GOMP_LAUNCH_CODE (tag));
+	}
     }
+  va_end (ap);
   
   acc_dev->openacc.async_set_async_func (async);
 
@@ -138,9 +175,8 @@ GOACC_parallel (int device, void (*fn) (
     devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
 			    + tgt->list[i]->tgt_offset);
 
-  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
-			      num_gangs, num_workers, vector_length, async,
-			      tgt);
+  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes,
+			      kinds, async, dims, tgt);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
@@ -154,6 +190,38 @@ GOACC_parallel (int device, void (*fn) (
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
+/* Legacy entry point.  */
+void
+GOACC_parallel (int device, void (*fn) (void *),
+		size_t mapnum, void **hostaddrs, size_t *sizes,
+		unsigned short *kinds,
+		int num_gangs, int num_workers, int vector_length,
+		int async, int num_waits, ...)
+{
+  int waits[9];
+  unsigned ix;
+  va_list ap;
+
+  if (num_waits > 8)
+    gomp_fatal ("too many waits for legacy interface");
+  
+  va_start (ap, num_waits);
+  for (ix = 0; ix != num_waits; ix++)
+    waits[ix] = va_arg (ap, int);
+  waits[ix] = 0;
+  va_end (ap);
+
+  GOACC_parallel_keyed (device, fn, mapnum, hostaddrs, sizes, kinds,
+			GOMP_LAUNCH_PACK (GOMP_LAUNCH_DIM, 0,
+					  GOMP_DIM_MASK (GOMP_DIM_MAX) - 1),
+			num_gangs, num_workers, vector_length,
+			GOMP_LAUNCH_PACK (GOMP_LAUNCH_ASYNC, 0,
+					  GOMP_LAUNCH_OP_MAX), async,
+			GOMP_LAUNCH_PACK (GOMP_LAUNCH_WAIT, 0, num_waits),
+			async, waits[0], waits[1], waits[2], waits[3],
+			waits[4], waits[5], waits[6], waits[7], waits[8]);
+}
+
 void
 GOACC_data_start (int device, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
@@ -230,7 +298,7 @@ GOACC_enter_exit_data (int device, size_
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
 
@@ -344,15 +412,15 @@ GOACC_enter_exit_data (int device, size_
 }
 
 static void
-goacc_wait (int async, int num_waits, va_list ap)
+goacc_wait (int async, int num_waits, va_list *ap)
 {
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
   while (num_waits--)
     {
-      int qid = va_arg (ap, int);
-
+      int qid = va_arg (*ap, int);
+      
       if (acc_async_test (qid))
 	continue;
 
@@ -389,7 +457,7 @@ GOACC_update (int device, size_t mapnum,
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
 
@@ -430,7 +498,7 @@ GOACC_wait (int async, int num_waits, ..
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
   else if (async == acc_async_sync)
Index: libgomp/libgomp_g.h
===================================================================
--- libgomp/libgomp_g.h	(revision 227862)
+++ libgomp/libgomp_g.h	(working copy)
@@ -222,9 +222,8 @@ extern void GOACC_data_start (int, size_
 extern void GOACC_data_end (void);
 extern void GOACC_enter_exit_data (int, size_t, void **,
 				   size_t *, unsigned short *, int, int, ...);
-extern void GOACC_parallel (int, void (*) (void *), size_t,
-			    void **, size_t *, unsigned short *, int, int, int,
-			    int, int, ...);
+extern void GOACC_parallel_2 (int, void (*) (void *), size_t,
+			      void **, size_t *, unsigned short *, ...);
 extern void GOACC_update (int, size_t, void **, size_t *,
 			  unsigned short *, int, int, ...);
 extern void GOACC_wait (int, int, ...);
Index: libgomp/libgomp.map
===================================================================
--- libgomp/libgomp.map	(revision 227862)
+++ libgomp/libgomp.map	(working copy)
@@ -332,6 +332,11 @@ GOACC_2.0 {
 	GOACC_get_num_threads;
 };
 
+GOACC_2.0.1 {
+  global:
+	GOACC_parallel_keyed;
+} GOACC_2.0;
+
 GOMP_PLUGIN_1.0 {
   global:
 	GOMP_PLUGIN_malloc;
Index: libgomp/plugin/plugin-nvptx.c
===================================================================
--- libgomp/plugin/plugin-nvptx.c	(revision 227862)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -282,12 +282,20 @@ map_push (struct ptx_stream *s, int asyn
   return;
 }
 
+/* Target data function launch information.  */
+
+struct targ_fn_launch
+{
+  const char *fn;
+  unsigned short dim[3];
+};
+
 /* Descriptor of a loaded function.  */
 
 struct targ_fn_descriptor
 {
   CUfunction fn;
-  const char *name;
+  const struct targ_fn_launch *launch;
 };
 
 /* A loaded PTX image.  */
@@ -929,8 +937,8 @@ event_add (enum ptx_event_type type, CUe
 
 void
 nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
-	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
-	  int vector_length, int async, void *targ_mem_desc)
+	    size_t *sizes, unsigned short *kinds, int async, unsigned *dims,
+	    void *targ_mem_desc)
 {
   struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
   CUfunction function;
@@ -939,7 +947,6 @@ nvptx_exec (void (*fn), size_t mapnum, v
   struct ptx_stream *dev_str;
   void *kargs[1];
   void *hp, *dp;
-  unsigned int nthreads_in_block;
   struct nvptx_thread *nvthd = nvptx_thread ();
   const char *maybe_abort_msg = "(perhaps abort was called)";
 
@@ -948,6 +955,20 @@ nvptx_exec (void (*fn), size_t mapnum, v
   dev_str = select_stream_for_async (async, pthread_self (), false, NULL);
   assert (dev_str == nvthd->current_stream);
 
+  /* Initialize the launch dimensions.  Typically this is constant,
+     provided by the device compiler, but we must permit runtime
+     values.  */
+  for (i = 0; i != 3; i++)
+    if (targ_fn->launch->dim[i])
+      dims[i] = targ_fn->launch->dim[i];
+
+  if (dims[GOMP_DIM_GANG] != 1)
+    GOMP_PLUGIN_fatal ("non-unity num_gangs (%d) not supported",
+		       dims[GOMP_DIM_GANG]);
+  if (dims[GOMP_DIM_WORKER] != 1)
+    GOMP_PLUGIN_fatal ("non-unity num_workers (%d) not supported",
+		       dims[GOMP_DIM_WORKER]);
+
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
      the host and the device. HP is a host pointer to the new chunk, and DP is
      the corresponding device pointer.  */
@@ -965,35 +986,21 @@ nvptx_exec (void (*fn), size_t mapnum, v
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuMemcpy failed: %s", cuda_error (r));
 
-  GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch\n", __FUNCTION__, targ_fn->name);
+  GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch"
+		     " gangs=%u, workers=%u, vectors=%u\n",
+		     __FUNCTION__, targ_fn->launch->fn,
+		     dims[0], dims[1], dims[2]);
 
   // OpenACC		CUDA
   //
-  // num_gangs		blocks
-  // num_workers	warps (where a warp is equivalent to 32 threads)
-  // vector length	threads
-  //
-
-  /* The openacc vector_length clause 'determines the vector length to use for
-     vector or SIMD operations'.  The question is how to map this to CUDA.
-
-     In CUDA, the warp size is the vector length of a CUDA device.  However, the
-     CUDA interface abstracts away from that, and only shows us warp size
-     indirectly in maximum number of threads per block, which is a product of
-     warp size and the number of hyperthreads of a multiprocessor.
-
-     We choose to map openacc vector_length directly onto the number of threads
-     in a block, in the x dimension.  This is reflected in gcc code generation
-     that uses ThreadIdx.x to access vector elements.
-
-     Attempting to use an openacc vector_length of more than the maximum number
-     of threads per block will result in a cuda error.  */
-  nthreads_in_block = vector_length;
+  // num_gangs		nctaid.x
+  // num_workers	ntid.y
+  // vector length	ntid.x
 
   kargs[0] = &dp;
   r = cuLaunchKernel (function,
-		      num_gangs, 1, 1,
-		      nthreads_in_block, 1, 1,
+		      dims[GOMP_DIM_GANG], 1, 1,
+		      dims[GOMP_DIM_VECTOR], dims[GOMP_DIM_WORKER], 1,
 		      0, dev_str->stream, kargs, 0);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
@@ -1039,7 +1046,7 @@ nvptx_exec (void (*fn), size_t mapnum, v
 #endif
 
   GOMP_PLUGIN_debug (0, "  %s: kernel %s: finished\n", __FUNCTION__,
-		     targ_fn->name);
+		     targ_fn->launch->fn);
 
 #ifndef DISABLE_ASYNC
   if (async < acc_async_noval)
@@ -1567,7 +1574,7 @@ typedef struct nvptx_tdata
   const char *const *var_names;
   size_t var_num;
 
-  const char *const *fn_names;
+  const struct targ_fn_launch *fn_descs;
   size_t fn_num;
 } nvptx_tdata_t;
 
@@ -1588,7 +1595,8 @@ GOMP_OFFLOAD_load_image (int ord, unsign
 			 struct addr_pair **target_table)
 {
   CUmodule module;
-  const char *const *fn_names, *const *var_names;
+  const char *const *var_names;
+  const struct targ_fn_launch *fn_descs;
   unsigned int fn_entries, var_entries, i, j;
   CUresult r;
   struct targ_fn_descriptor *targ_fns;
@@ -1617,7 +1625,7 @@ GOMP_OFFLOAD_load_image (int ord, unsign
   var_entries = img_header->var_num;
   var_names = img_header->var_names;
   fn_entries = img_header->fn_num;
-  fn_names = img_header->fn_names;
+  fn_descs = img_header->fn_descs;
 
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
 				 * (fn_entries + var_entries));
@@ -1640,12 +1648,12 @@ GOMP_OFFLOAD_load_image (int ord, unsign
     {
       CUfunction function;
 
-      r = cuModuleGetFunction (&function, module, fn_names[i]);
+      r = cuModuleGetFunction (&function, module, fn_descs[i].fn);
       if (r != CUDA_SUCCESS)
 	GOMP_PLUGIN_fatal ("cuModuleGetFunction error: %s", cuda_error (r));
 
       targ_fns->fn = function;
-      targ_fns->name = (const char *) fn_names[i];
+      targ_fns->launch = &fn_descs[i];
 
       targ_tbl->start = (uintptr_t) targ_fns;
       targ_tbl->end = targ_tbl->start + 1;
@@ -1724,13 +1732,12 @@ void (*device_run) (int n, void *fn_ptr,
 
 void
 GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
-			       void **hostaddrs, void **devaddrs, size_t *sizes,
-			       unsigned short *kinds, int num_gangs,
-			       int num_workers, int vector_length, int async,
-			       void *targ_mem_desc)
+			       void **hostaddrs, void **devaddrs,
+			       size_t *sizes, unsigned short *kinds,
+			       int async, unsigned *dims, void *targ_mem_desc)
 {
-  nvptx_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs,
-	    num_workers, vector_length, async, targ_mem_desc);
+  nvptx_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
+	      async, dims, targ_mem_desc);
 }
 
 void
Index: libgomp/oacc-host.c
===================================================================
--- libgomp/oacc-host.c	(revision 227862)
+++ libgomp/oacc-host.c	(working copy)
@@ -137,10 +137,8 @@ host_openacc_exec (void (*fn) (void *),
 		   void **devaddrs __attribute__ ((unused)),
 		   size_t *sizes __attribute__ ((unused)),
 		   unsigned short *kinds __attribute__ ((unused)),
-		   int num_gangs __attribute__ ((unused)),
-		   int num_workers __attribute__ ((unused)),
-		   int vector_length __attribute__ ((unused)),
 		   int async __attribute__ ((unused)),
+		   unsigned *dims __attribute ((unused)),
 		   void *targ_mem_desc __attribute__ ((unused)))
 {
   fn (hostaddrs);
Index: gcc/config/nvptx/nvptx.c
===================================================================
--- gcc/config/nvptx/nvptx.c	(revision 227862)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -56,6 +56,8 @@
 #include "cfgrtl.h"
 #include "stor-layout.h"
 #include "builtins.h"
+#include "omp-low.h"
+#include "gomp-constants.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2066,9 +2068,51 @@ nvptx_vector_alignment (const_tree type)
 static void
 nvptx_record_offload_symbol (tree decl)
 {
-  fprintf (asm_out_file, "//:%s_MAP %s\n",
-	   TREE_CODE (decl) == VAR_DECL ? "VAR" : "FUNC",
-	   IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+  switch (TREE_CODE (decl))
+    {
+    case VAR_DECL:
+      fprintf (asm_out_file, "//:VAR_MAP \"%s\"\n",
+	       IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+      break;
+
+    case FUNCTION_DECL:
+      {
+	tree attr = get_oacc_fn_attrib (decl);
+	tree dims = NULL_TREE;
+	unsigned ix;
+
+	if (attr)
+	  dims = TREE_VALUE (attr);
+	fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
+		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+
+	for (ix = 0; ix != GOMP_DIM_MAX; ix++)
+	  {
+	    int size = 1;
+
+	    /* TODO: This check can go away once the dimension default
+	       machinery is merged to trunk.  */
+	    if (dims)
+	      {
+		tree dim = TREE_VALUE (dims);
+
+		if (dim)
+		  size = TREE_INT_CST_LOW (dim);
+
+		gcc_assert (!TREE_PURPOSE (dims));
+		dims = TREE_CHAIN (dims);
+	      }
+	    
+	    fprintf (asm_out_file, ", %#x", size);
+	  }
+	
+	fprintf (asm_out_file, "\n");
+      }
+      break;
+  
+    default:
+      gcc_unreachable ();
+    }
 }
 
 /* Implement TARGET_ASM_FILE_START.  Write the kinds of things ptxas expects
Index: gcc/config/nvptx/mkoffload.c
===================================================================
--- gcc/config/nvptx/mkoffload.c	(revision 227862)
+++ gcc/config/nvptx/mkoffload.c	(working copy)
@@ -842,6 +842,8 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  const char *comma;
+  id_map const *id;
 
   do
     tok = parse_file (tok);
@@ -853,21 +855,25 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
 
-  fprintf (out, "static const char *const var_mappings[] = {\n");
-  for (id_map *id = var_ids; id; id = id->next)
-    fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
-  fprintf (out, "};\n\n");
-  fprintf (out, "static const char *const func_mappings[] = {\n");
-  for (id_map *id = func_ids; id; id = id->next)
-    fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
-  fprintf (out, "};\n\n");
+  fprintf (out, "static const char *const var_mappings[] = {");
+  for (comma = "", id = var_ids; id; comma = ",", id = id->next)
+    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+  fprintf (out, "\n};\n\n");
+
+  fprintf (out, "static const struct nvptx_fn {\n"
+	   "  const char *name;\n"
+	   "  unsigned short dim[3];\n"
+	   "} func_mappings[] = {\n");
+  for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+  fprintf (out, "\n};\n\n");
 
   fprintf (out,
 	   "static const struct nvptx_tdata {\n"
 	   "  const char *ptx_src;\n"
 	   "  const char *const *var_names;\n"
 	   "  __SIZE_TYPE__ var_num;\n"
-	   "  const char *const *fn_names;\n"
+	   "  const struct nvptx_fn *fn_names;\n"
 	   "  __SIZE_TYPE__ fn_num;\n"
 	   "} target_data = {\n"
 	   "  ptx_code,\n"
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 227862)
+++ gcc/omp-low.c	(working copy)
@@ -82,7 +82,6 @@ along with GCC; see the file COPYING3.
 #include "lto-section-names.h"
 #include "gomp-constants.h"
 
-
 /* Lowering of OMP parallel and workshare constructs proceeds in two
    phases.  The first phase scans the function looking for OMP statements
    and then for variables that must be replaced to satisfy data sharing
@@ -8861,6 +8860,110 @@ expand_omp_atomic (struct omp_region *re
 }
 
 
+/* Encode an oacc launc argument.  This matches the GOMP_LAUNCH_PACK
+   macro on gomp-constants.h.  We do not check for overflow.  */
+
+static tree
+oacc_launch_pack (unsigned code, tree device, unsigned op)
+{
+  tree res;
+  
+  res = build_int_cst (unsigned_type_node, GOMP_LAUNCH_PACK (code, 0, op));
+  if (device)
+    {
+      device = fold_build2 (LSHIFT_EXPR, unsigned_type_node,
+			    device, build_int_cst (unsigned_type_node,
+						   GOMP_LAUNCH_DEVICE_SHIFT));
+      res = fold_build2 (BIT_IOR_EXPR, unsigned_type_node, res, device);
+    }
+  return res;
+}
+
+/* Look for compute grid dimension clauses and convert to an attribute
+   attached to FN.  This permits the target-side code to (a) massage
+   the dimensions, (b) emit that data and (c) optimize.  Non-constant
+   dimensions are pushed onto ARGS.
+
+   The attribute value is a TREE_LIST.  A set of dimensions is
+   represented as a list of INTEGER_CST.  Those that are runtime
+   expres are represented as an INTEGER_CST of zero.
+
+   TOOO. Normally the attribute will just contain a single such list.  If
+   however it contains a list of lists, this will represent the use of
+   device_type.  Each member of the outer list is an assoc list of
+   dimensions, keyed by the device type.  The first entry will be the
+   default.  Well, that's the plan.  */
+
+#define OACC_FN_ATTRIB "oacc function"
+
+/* Replace any existing oacc fn attribute with updated dimensions.  */
+
+void
+replace_oacc_fn_attrib (tree fn, tree dims)
+{
+  tree ident = get_identifier (OACC_FN_ATTRIB);
+  tree attribs = DECL_ATTRIBUTES (fn);
+
+  /* If we happen to be present as the first attrib, drop it.  */
+  if (attribs && TREE_PURPOSE (attribs) == ident)
+    attribs = TREE_CHAIN (attribs);
+  DECL_ATTRIBUTES (fn) = tree_cons (ident, dims, attribs);
+}
+
+/* Scan CLAUSES for launch dimensions and attach them to the oacc
+   function attribute.  Push any that are non-constant onto the ARGS
+   list, along with an appropriate GOMP_LAUNCH_DIM tag.  */
+
+static void
+set_oacc_fn_attrib (tree fn, tree clauses, vec<tree> *args)
+{
+  /* Must match GOMP_DIM ordering.  */
+  static const omp_clause_code ids[]
+    = { OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS,
+	OMP_CLAUSE_VECTOR_LENGTH };
+  unsigned ix;
+  tree dims[GOMP_DIM_MAX];
+  tree attr = NULL_TREE;
+  unsigned non_const = 0;
+
+  for (ix = GOMP_DIM_MAX; ix--;)
+    {
+      tree clause = find_omp_clause (clauses, ids[ix]);
+      tree dim = NULL_TREE;
+
+      if (clause)
+	dim = OMP_CLAUSE_EXPR (clause, ids[ix]);
+      dims[ix] = dim;
+      if (dim && TREE_CODE (dim) != INTEGER_CST)
+	{
+	  dim = integer_zero_node;
+	  non_const |= GOMP_DIM_MASK (ix);
+	}
+      attr = tree_cons (NULL_TREE, dim, attr);
+    }
+
+  replace_oacc_fn_attrib (fn, attr);
+
+  if (non_const)
+    {
+      /* Push a dynamic argument set.  */
+      args->safe_push (oacc_launch_pack (GOMP_LAUNCH_DIM,
+					 NULL_TREE, non_const));
+      for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
+	if (non_const & GOMP_DIM_MASK (ix))
+	  args->safe_push (dims[ix]);
+    }
+}
+
+/* Retrieve the oacc function attrib and return it.  Non-oacc
+   functions will return NULL.  */
+
+tree
+get_oacc_fn_attrib (tree fn)
+{
+  return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -8881,10 +8984,10 @@ expand_omp_target (struct omp_region *re
   offloaded = is_gimple_omp_offloaded (entry_stmt);
   switch (gimple_omp_target_kind (entry_stmt))
     {
-    case GF_OMP_TARGET_KIND_REGION:
-    case GF_OMP_TARGET_KIND_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
+    case GF_OMP_TARGET_KIND_REGION:
+    case GF_OMP_TARGET_KIND_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
       data_region = false;
@@ -9216,6 +9319,7 @@ expand_omp_target (struct omp_region *re
     }
 
   gimple g;
+  bool tagging = false;
   /* The maximum number used by any start_ix, without varargs.  */
   auto_vec<tree, 11> args;
   args.quick_push (device);
@@ -9251,88 +9355,87 @@ expand_omp_target (struct omp_region *re
       break;
     case BUILT_IN_GOACC_PARALLEL:
       {
-	tree t_num_gangs, t_num_workers, t_vector_length;
-
-	/* Default values for num_gangs, num_workers, and vector_length.  */
-	t_num_gangs = t_num_workers = t_vector_length
-	  = fold_convert_loc (gimple_location (entry_stmt),
-			      integer_type_node, integer_one_node);
-	/* ..., but if present, use the value specified by the respective
-	   clause, making sure that are of the correct type.  */
-	c = find_omp_clause (clauses, OMP_CLAUSE_NUM_GANGS);
-	if (c)
-	  t_num_gangs = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					  integer_type_node,
-					  OMP_CLAUSE_NUM_GANGS_EXPR (c));
-	c = find_omp_clause (clauses, OMP_CLAUSE_NUM_WORKERS);
-	if (c)
-	  t_num_workers = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					    integer_type_node,
-					    OMP_CLAUSE_NUM_WORKERS_EXPR (c));
-	c = find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH);
-	if (c)
-	  t_vector_length = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					      integer_type_node,
-					      OMP_CLAUSE_VECTOR_LENGTH_EXPR (c));
-	args.quick_push (t_num_gangs);
-	args.quick_push (t_num_workers);
-	args.quick_push (t_vector_length);
+	set_oacc_fn_attrib (child_fn, clauses, &args);
+	tagging = true;
       }
       /* FALLTHRU */
     case BUILT_IN_GOACC_ENTER_EXIT_DATA:
     case BUILT_IN_GOACC_UPDATE:
       {
-	tree t_async;
-	int t_wait_idx;
+	tree t_async = NULL_TREE;
 
-	/* Default values for t_async.  */
-	t_async = fold_convert_loc (gimple_location (entry_stmt),
-				    integer_type_node,
-				    build_int_cst (integer_type_node,
-						   GOMP_ASYNC_SYNC));
-	/* ..., but if present, use the value specified by the respective
+	/* If present, use the value specified by the respective
 	   clause, making sure that is of the correct type.  */
 	c = find_omp_clause (clauses, OMP_CLAUSE_ASYNC);
 	if (c)
 	  t_async = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
 				      integer_type_node,
 				      OMP_CLAUSE_ASYNC_EXPR (c));
-
-	args.quick_push (t_async);
-	/* Save the index, and... */
-	t_wait_idx = args.length ();
-	/* ... push a default value.  */
-	args.quick_push (fold_convert_loc (gimple_location (entry_stmt),
-					   integer_type_node,
-					   integer_zero_node));
-	c = find_omp_clause (clauses, OMP_CLAUSE_WAIT);
-	if (c)
+	else if (!tagging)
+	  /* Default values for t_async.  */
+	  t_async = fold_convert_loc (gimple_location (entry_stmt),
+				      integer_type_node,
+				      build_int_cst (integer_type_node,
+						     GOMP_ASYNC_SYNC));
+	if (tagging && t_async)
 	  {
-	    int n = 0;
+	    unsigned HOST_WIDE_INT i_async;
 
-	    for (; c; c = OMP_CLAUSE_CHAIN (c))
+	    if (TREE_CODE (t_async) == INTEGER_CST)
 	      {
-		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
-		  {
-		    args.safe_push (fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-						      integer_type_node,
-						      OMP_CLAUSE_WAIT_EXPR (c)));
-		    n++;
-		  }
+		/* See if we can pack the async arg in to the tag's
+		   operand.  */
+		i_async = TREE_INT_CST_LOW (t_async);
+
+		if (i_async < GOMP_LAUNCH_OP_MAX)
+		  t_async = NULL_TREE;
 	      }
+	    if (t_async)
+	      i_async = GOMP_LAUNCH_OP_MAX;
+	    args.safe_push (oacc_launch_pack
+			    (GOMP_LAUNCH_ASYNC, NULL_TREE, i_async));
+	  }
+	if (t_async)
+	  args.safe_push (t_async);
 
-	    /* Now that we know the number, replace the default value.  */
-	    args.ordered_remove (t_wait_idx);
-	    args.quick_insert (t_wait_idx,
-			       fold_convert_loc (gimple_location (entry_stmt),
-						 integer_type_node,
-						 build_int_cst (integer_type_node, n)));
+	/* Save the argument index, and ... */
+	unsigned t_wait_idx = args.length ();
+	unsigned num_waits = 0;
+	c = find_omp_clause (clauses, OMP_CLAUSE_WAIT);
+	if (!tagging || c)
+	  /* ... push a placeholder.  */
+	  args.safe_push (integer_zero_node);
+
+	for (; c; c = OMP_CLAUSE_CHAIN (c))
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
+	    {
+	      args.safe_push (fold_convert_loc (OMP_CLAUSE_LOCATION (c),
+						integer_type_node,
+						OMP_CLAUSE_WAIT_EXPR (c)));
+	      num_waits++;
+	    }
+
+	if (!tagging || num_waits)
+	  {
+	    tree len;
+
+	    /* Now that we know the number, update the placeholder.  */
+	    if (tagging)
+	      len = oacc_launch_pack (GOMP_LAUNCH_WAIT, NULL_TREE, num_waits);
+	    else
+	      len = build_int_cst (integer_type_node, num_waits);
+	    len = fold_convert_loc (gimple_location (entry_stmt),
+				    unsigned_type_node, len);
+	    args[t_wait_idx] = len;
 	  }
       }
       break;
     default:
       gcc_unreachable ();
     }
+  if (tagging)
+    /*  Push terminal marker - zero.  */
+    args.safe_push (oacc_launch_pack (0, NULL_TREE, 0));
 
   g = gimple_build_call_vec (builtin_decl_explicit (start_ix), args);
   gimple_set_location (g, gimple_location (entry_stmt));
Index: gcc/omp-low.h
===================================================================
--- gcc/omp-low.h	(revision 227862)
+++ gcc/omp-low.h	(working copy)
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern tree get_oacc_fn_attrib (tree);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
Index: gcc/builtin-types.def
===================================================================
--- gcc/builtin-types.def	(revision 227862)
+++ gcc/builtin-types.def	(working copy)
@@ -590,15 +590,14 @@ DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_STRIN
 DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_INT_INT_INT_INT_INT_VAR,
 			 BT_INT, BT_INT, BT_INT, BT_INT, BT_INT, BT_INT)
 
+DEF_FUNCTION_TYPE_VAR_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
+			 BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
+			 BT_PTR, BT_PTR, BT_PTR)
+
 DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 			 BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
 			 BT_PTR, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_11 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
-			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
-			  BT_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT, BT_INT,
-			  BT_INT, BT_INT)
-
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_VAR, BT_FN_VOID_VAR)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_FN_VOID_VAR_PTR_SIZE,
 		     BT_PTR, BT_PTR_FN_VOID_VAR, BT_PTR, BT_SIZE)
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 227862)
+++ gcc/tree.h	(working copy)
@@ -1369,6 +1370,8 @@ extern void protected_set_expr_location
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE), 0)
 
 /* OpenACC clause expressions  */
+#define OMP_CLAUSE_EXPR(NODE, CLAUSE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, CLAUSE), 0)
 #define OMP_CLAUSE_GANG_EXPR(NODE) \
   OMP_CLAUSE_OPERAND ( \
     OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_GANG), 0)
Index: gcc/omp-builtins.def
===================================================================
--- gcc/omp-builtins.def	(revision 227862)
+++ gcc/omp-builtins.def	(working copy)
@@ -38,8 +38,8 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_E
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel",
-		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel_keyed",
+		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
 		   ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
Index: gcc/lto/lto-lang.c
===================================================================
--- gcc/lto/lto-lang.c	(revision 227862)
+++ gcc/lto/lto-lang.c	(working copy)
@@ -160,10 +160,10 @@ enum lto_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG6) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -182,8 +182,8 @@ enum lto_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -668,13 +668,12 @@ lto_define_builtins (tree va_list_ref_ty
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6)	\
+  def_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
   def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11)	\
-  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
-	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -696,8 +695,8 @@ lto_define_builtins (tree va_list_ref_ty
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 227862)
+++ gcc/c-family/c-common.c	(working copy)
@@ -5545,10 +5545,10 @@ enum c_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -5567,8 +5567,8 @@ enum c_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -5661,13 +5661,12 @@ c_define_builtins (tree va_list_ref_type
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) \
+  def_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
   def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
-  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,      \
-	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -5689,8 +5688,8 @@ c_define_builtins (tree va_list_ref_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/fortran/f95-lang.c
===================================================================
--- gcc/fortran/f95-lang.c	(revision 227862)
+++ gcc/fortran/f95-lang.c	(working copy)
@@ -635,10 +635,10 @@ gfc_init_builtin_functions (void)
 			    ARG6, ARG7, ARG8) NAME,
 #define DEF_FUNCTION_TYPE_VAR_0(NAME, RETURN) NAME,
 #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -653,8 +653,8 @@ gfc_init_builtin_functions (void)
 #undef DEF_FUNCTION_TYPE_8
 #undef DEF_FUNCTION_TYPE_VAR_0
 #undef DEF_FUNCTION_TYPE_VAR_2
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
     BT_LAST
   };
@@ -1096,8 +1096,8 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG1],     	\
 					builtin_types[(int) ARG2],     	\
 					NULL_TREE);
-#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7)				\
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6)	\
   builtin_types[(int) ENUM]						\
     = build_varargs_function_type_list (builtin_types[(int) RETURN],   	\
 					builtin_types[(int) ARG1],     	\
@@ -1106,10 +1106,9 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG4],	\
 					builtin_types[(int) ARG5],	\
 					builtin_types[(int) ARG6],	\
-					builtin_types[(int) ARG7],	\
 					NULL_TREE);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11)	\
+#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7)				\
   builtin_types[(int) ENUM]						\
     = build_varargs_function_type_list (builtin_types[(int) RETURN],   	\
 					builtin_types[(int) ARG1],     	\
@@ -1119,10 +1118,6 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG5],	\
 					builtin_types[(int) ARG6],	\
 					builtin_types[(int) ARG7],	\
-					builtin_types[(int) ARG8],	\
-					builtin_types[(int) ARG9],	\
-					builtin_types[(int) ARG10],	\
-					builtin_types[(int) ARG11],	\
 					NULL_TREE);
 #define DEF_POINTER_TYPE(ENUM, TYPE)			\
   builtin_types[(int) ENUM]				\
@@ -1140,8 +1135,8 @@ gfc_init_builtin_functions (void)
 #undef DEF_FUNCTION_TYPE_8
 #undef DEF_FUNCTION_TYPE_VAR_0
 #undef DEF_FUNCTION_TYPE_VAR_2
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/fortran/types.def
===================================================================
--- gcc/fortran/types.def	(revision 227862)
+++ gcc/fortran/types.def	(working copy)
@@ -219,7 +219,6 @@ DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_
 			 BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
 			 BT_PTR, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_11 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+DEF_FUNCTION_TYPE_VAR_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
 			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
-			  BT_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT, BT_INT,
-			  BT_INT, BT_INT)
+			  BT_PTR, BT_PTR, BT_PTR)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-17 14:43   ` Nathan Sidwell
@ 2015-09-18  9:13     ` Bernd Schmidt
  2015-09-18 18:56       ` Nathan Sidwell
  2015-09-24  9:28       ` Jakub Jelinek
  2015-09-21 16:30     ` Openacc launch API Nathan Sidwell
  1 sibling, 2 replies; 34+ messages in thread
From: Bernd Schmidt @ 2015-09-18  9:13 UTC (permalink / raw)
  To: Nathan Sidwell, Jakub Jelinek; +Cc: GCC Patches

On 09/17/2015 04:40 PM, Nathan Sidwell wrote:

> Added call to gomp_fatal, indicating libgomp is out of date. Also added
> a default to the switch following with the same effect.  The trouble
> with implementing handling of device_type here now, is difficulty in
> testing its correctness.  If it were  buggy we'd be in a worse position
> than not having it.

Is that so difficult though? See if nvptx ignores (let's say) intelmic 
arguments in favour of the default and accepts nvptx ones.

> +  if (num_waits > 8)
> +    gomp_fatal ("too many waits for legacy interface");
> +
> +  va_start (ap, num_waits);
> +  for (ix = 0; ix != num_waits; ix++)
> +    waits[ix] = va_arg (ap, int);
> +  waits[ix] = 0;
> +  va_end (ap);

I still don't like this. I think there are at least two better 
alternatives: add a new GOMP_LAUNCH_key which makes GOACC_parallel read 
a number of waits from a va_list * pointer passed after it, or just 
admit that the legacy function always does host fallback and just 
truncate the current version after

   if (host_fallback)
     {
       goacc_save_and_set_bind (acc_device_host);
       fn (hostaddrs);
       goacc_restore_bind ();
       return;
     }

(which incidentally ignores all the wait arguments).

Other than that the patch is fine with me, but Jakub should have the 
last word.


Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-18  9:13     ` Bernd Schmidt
@ 2015-09-18 18:56       ` Nathan Sidwell
  2015-09-24  9:28       ` Jakub Jelinek
  1 sibling, 0 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-18 18:56 UTC (permalink / raw)
  To: Bernd Schmidt, Jakub Jelinek; +Cc: GCC Patches

On 09/18/15 05:13, Bernd Schmidt wrote:

> Is that so difficult though? See if nvptx ignores (let's say) intelmic arguments
> in favour of the default and accepts nvptx ones.

I'm sorry, I think it is unreasonable to require support in this patch for 
something that's not yet implemented in the rest of the toolchain.  The 
likelihood of implementing it correctly without  all the other bits filled in is 
low -- there are bound to be unanticipated subtleties.

> I still don't like this. I think there are at least two better alternatives: add
> a new GOMP_LAUNCH_key which makes GOACC_parallel read a number of waits from a
> va_list * pointer passed after it, or just admit that the legacy function always
> does host fallback and just truncate the current version after

I think you're worrying about  something that will not happen in practice.  As I 
said, the most waits I've seen on code has been 2, so a limit of 8 seems 
perfectly adequate.

nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-17 14:43   ` Nathan Sidwell
  2015-09-18  9:13     ` Bernd Schmidt
@ 2015-09-21 16:30     ` Nathan Sidwell
  1 sibling, 0 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-21 16:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Bernd Schmidt, GCC Patches

Jakub?

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01287.html


On 09/17/15 10:40, Nathan Sidwell wrote:
> Updated patch addressing your points.  Some  further comments though ...
>
>>> +  while (GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0)
>>> +     != (tag = va_arg (ap, unsigned)))
>>
>> That's a somewhat non-idiomatic way to write this, with the constant first and
>> not obviously a constant. I'd initialize a variable with the constant before the
>> loop.
>
> I went with
>    while ((tag = va_arg (...)) != 0) ...
>
> and killed GOMP_LAUNCH_END throughout, using explicit '0'.
>
>>> +      assert (!GOMP_LAUNCH_DEVICE (tag));
>>
>> Uh, that seems unfriendly, and not exactly forwards compatible. Can that fail a
>> bit more gracefully? (Alternatively, implement the device_type stuff now so that
>> we don't have TODOs in the code and don't have to worry about compatibility
>> issues.)
>
> Added call to gomp_fatal, indicating libgomp is out of date. Also added a
> default to the switch following with the same effect.  The trouble  with
> implementing handling of device_type here now, is difficulty in testing its
> correctness.  If it were  buggy we'd be in a worse position than not having it.
>
>>> +GOACC_2.0,1 {
>>> +  global:
>>> +    GOACC_parallel_keyed;
>>> +} GOACC_2.0;
>>
>> Did you mean to use a comma?
>
> I misunderstood your comment as 'did you mean to use a comma where you used
> something else', not 'is that comma a typo?'  well spotted!
>
>>> +    else if (!tagging)
>>
>> Oh... so tagging controls two different methods for constructing argument lists,
>> one for GOACC_parallel and the other for whatever OMP uses? That's a bit
>> unfortunate, I'll need to think about it for a bit or defer to Jakub.
>
> My earlier description was lacking.  The memory arguments have already been
> pushed before that switch.  This is just dealing with async & wait args.  I
> found it easier to modify the existing code path and have a tagging flag, rather
> than duplicate it.
>
> nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-18  9:13     ` Bernd Schmidt
  2015-09-18 18:56       ` Nathan Sidwell
@ 2015-09-24  9:28       ` Jakub Jelinek
  2015-09-24  9:58         ` Bernd Schmidt
  2015-09-28 21:20         ` Nathan Sidwell
  1 sibling, 2 replies; 34+ messages in thread
From: Jakub Jelinek @ 2015-09-24  9:28 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Nathan Sidwell, GCC Patches

On Fri, Sep 18, 2015 at 11:13:03AM +0200, Bernd Schmidt wrote:
> On 09/17/2015 04:40 PM, Nathan Sidwell wrote:
> 
> >Added call to gomp_fatal, indicating libgomp is out of date. Also added
> >a default to the switch following with the same effect.  The trouble
> >with implementing handling of device_type here now, is difficulty in
> >testing its correctness.  If it were  buggy we'd be in a worse position
> >than not having it.
> 
> Is that so difficult though? See if nvptx ignores (let's say) intelmic
> arguments in favour of the default and accepts nvptx ones.
> 
> >+  if (num_waits > 8)
> >+    gomp_fatal ("too many waits for legacy interface");
> >+
> >+  va_start (ap, num_waits);
> >+  for (ix = 0; ix != num_waits; ix++)
> >+    waits[ix] = va_arg (ap, int);
> >+  waits[ix] = 0;
> >+  va_end (ap);
> 
> I still don't like this. I think there are at least two better alternatives:
> add a new GOMP_LAUNCH_key which makes GOACC_parallel read a number of waits
> from a va_list * pointer passed after it, or just admit that the legacy
> function always does host fallback and just truncate the current version
> after
> 
>   if (host_fallback)
>     {
>       goacc_save_and_set_bind (acc_device_host);
>       fn (hostaddrs);
>       goacc_restore_bind ();
>       return;
>     }
> 
> (which incidentally ignores all the wait arguments).

Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
anyway because of the incompatible PTX version, then why don't you just
do
  goacc_save_and_set_bind (acc_device_host);
  fn (hostaddrs);
  goacc_restore_bind ();
and nothing else in GOACC_parallel?  If it doesn't always do host fallback,
then I wonder if e.g. the waits wouldn't be better represented as an array
of ints, GOMP_LAUNCH_WAIT op would then encode num_waits and be followed
by a va_arg (ap, int *) with num_waits entries in it.  No need to pass
va_list around, instead just the pointer, and the compatibility entry point
would alloca an array, stuff the waits in it and pass GOMP_LAUNCH_WAIT with
the allocated array.

Other than that, I think Bernd has covered all the issues I had.

	Jakub

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-24  9:28       ` Jakub Jelinek
@ 2015-09-24  9:58         ` Bernd Schmidt
  2015-09-24 10:28           ` Jakub Jelinek
  2015-09-28 21:20         ` Nathan Sidwell
  1 sibling, 1 reply; 34+ messages in thread
From: Bernd Schmidt @ 2015-09-24  9:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Nathan Sidwell, GCC Patches

On 09/24/2015 10:40 AM, Jakub Jelinek wrote:
> Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
> anyway because of the incompatible PTX version, then why don't you just
> do
>    goacc_save_and_set_bind (acc_device_host);
>    fn (hostaddrs);
>    goacc_restore_bind ();
> and nothing else in GOACC_parallel?

That was essentially my suggestion.

> Other than that, I think Bernd has covered all the issues I had.

What is your opinion on the forward compatibility issue? Is it something 
we care about?


Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-24  9:58         ` Bernd Schmidt
@ 2015-09-24 10:28           ` Jakub Jelinek
  2015-09-24 10:41             ` Bernd Schmidt
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Jelinek @ 2015-09-24 10:28 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Nathan Sidwell, GCC Patches

On Thu, Sep 24, 2015 at 11:50:56AM +0200, Bernd Schmidt wrote:
> On 09/24/2015 10:40 AM, Jakub Jelinek wrote:
> >Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
> >anyway because of the incompatible PTX version, then why don't you just
> >do
> >   goacc_save_and_set_bind (acc_device_host);
> >   fn (hostaddrs);
> >   goacc_restore_bind ();
> >and nothing else in GOACC_parallel?
> 
> That was essentially my suggestion.
> 
> >Other than that, I think Bernd has covered all the issues I had.
> 
> What is your opinion on the forward compatibility issue? Is it something we
> care about?

For the (unlikely) case of using a newer GCC compiled binaries or libraries
with older libgomp, I'd prefer something other than silent crash.
Often it will not just start at all, because the binary needs newer symbols
from the library, if that is not the case, then supposedly ignoring some
newer features is fine, gomp_fatal is acceptable too though, but assert
failure is not.

	Jakub

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-24 10:28           ` Jakub Jelinek
@ 2015-09-24 10:41             ` Bernd Schmidt
  0 siblings, 0 replies; 34+ messages in thread
From: Bernd Schmidt @ 2015-09-24 10:41 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Nathan Sidwell, GCC Patches

On 09/24/2015 11:56 AM, Jakub Jelinek wrote:
> On Thu, Sep 24, 2015 at 11:50:56AM +0200, Bernd Schmidt wrote:
>> On 09/24/2015 10:40 AM, Jakub Jelinek wrote:
>>> Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
>>> anyway because of the incompatible PTX version, then why don't you just
>>> do
>>>    goacc_save_and_set_bind (acc_device_host);
>>>    fn (hostaddrs);
>>>    goacc_restore_bind ();
>>> and nothing else in GOACC_parallel?
>>
>> That was essentially my suggestion.
>>
>>> Other than that, I think Bernd has covered all the issues I had.
>>
>> What is your opinion on the forward compatibility issue? Is it something we
>> care about?
>
> For the (unlikely) case of using a newer GCC compiled binaries or libraries
> with older libgomp, I'd prefer something other than silent crash.
> Often it will not just start at all, because the binary needs newer symbols
> from the library, if that is not the case, then supposedly ignoring some
> newer features is fine, gomp_fatal is acceptable too though, but assert
> failure is not.

In that case the patch is OK with the change suggested above.


Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-24  9:28       ` Jakub Jelinek
  2015-09-24  9:58         ` Bernd Schmidt
@ 2015-09-28 21:20         ` Nathan Sidwell
  2016-04-12 13:17           ` libgomp external ABI prototypes maintenance (was: Openacc launch API) Thomas Schwinge
  2016-04-20 11:35           ` libgomp: Make GCC 5 OpenACC offloading executables work " Thomas Schwinge
  1 sibling, 2 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-28 21:20 UTC (permalink / raw)
  To: Jakub Jelinek, Bernd Schmidt; +Cc: GCC Patches, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

On 09/24/15 04:40, Jakub Jelinek wrote:

> Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
> anyway because of the incompatible PTX version, then why don't you just
> do
>    goacc_save_and_set_bind (acc_device_host);
>    fn (hostaddrs);
>    goacc_restore_bind ();

Committed the  attached.  Thanks for the review.

nathan


[-- Attachment #2: trunk-launch-0928.patch --]
[-- Type: text/x-patch, Size: 43431 bytes --]

2015-09-28  Nathan Sidwell  <nathan@codesourcery.com>

	inlude/
	* gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment.
	(GOMP_DIM_GANG, GOMP_DIM_WORKER, GOMP_DIM_VECTOR, GOMP_DIM_MAX,
	GOMP_DIM_MASK): New.
	(GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC, GOMP_LAUNCH_WAIT): New.
	(GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUNCH_DEVICE_SHIFT,
	GOMP_LAUNCH_OP_SHIFT): New.
	(GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE,
	GOMP_LAUNCH_OP): New.
	(GOMP_LAUNCH_OP_MAX): New.

	libgomp/
	* libgomp.h (acc_dispatch_t): Replace separate geometry args with
	array.
	* libgomp.map (GOACC_parallel_keyed): New.
	* oacc-parallel.c (goacc_wait): Take pointer to va_list.  Adjust
	all callers.
	(GOACC_parallel_keyed): New interface.  Lose geometry arguments
	and take keyed varargs list.  Adjust call to exec_func.
	(GOACC_parallel): Force host fallback.
	* libgomp_g.h (GOACC_parallel): Remove.
	(GOACC_parallel_keyed): Declare.
	* plugin/plugin-nvptx.c (struct targ_fn_launch): New struct.
	(stuct targ_gn_descriptor): Replace name field with launch field.
	(nvptx_exec): Lose separate geometry args, take array.  Process
	dynamic dimensions and adjust.
	(struct nvptx_tdata): Replace fn_names field with fn_descs.
	(GOMP_OFFLOAD_load_image): Adjust for change in function table
	data.
	(GOMP_OFFLOAD_openacc_parallel): Adjust for change in dimension
	passing.
	* oacc-host.c (host_openacc_exec): Adjust for change in dimension
	passing.

	gcc/
	* config/nvptx/nvptx.c: Include omp-low.h and gomp-constants.h.
	(nvptx_record_offload_symbol): Record function execution geometry.
	* config/nvptx/mkoffload.c (process): Include launch geometry in
	function data.
	* omp-low.c (oacc_launch_pack): New.
	(replace_oacc_fn_attrib): New.
	(set_oacc_fn_attrib): New.
	(get_oacc_fn_attrib): New.
	(expand_omp_target): Create keyed varargs for GOACC_parallel call
	generation.
	* omp-low.h (get_oacc_fn_attrib): Declare.
	* builtin-types.def (DEF_FUNCTION_TyPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* tree.h (OMP_CLAUSE_EXPR): New.
	* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Change target fn name.

	gcc/lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/c-family/
	* c-common.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* types.def (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

Index: include/gomp-constants.h
===================================================================
--- include/gomp-constants.h	(revision 228086)
+++ include/gomp-constants.h	(working copy)
@@ -115,11 +115,33 @@ enum gomp_map_kind
 
 /* Versions of libgomp and device-specific plugins.  */
 #define GOMP_VERSION	0
-#define GOMP_VERSION_NVIDIA_PTX 0
+#define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0xffff)
 #define GOMP_VERSION_DEV(PACK) ((PACK) & 0xffff)
 
+#define GOMP_DIM_GANG	0
+#define GOMP_DIM_WORKER	1
+#define GOMP_DIM_VECTOR	2
+#define GOMP_DIM_MAX	3
+#define GOMP_DIM_MASK(X) (1u << (X))
+
+/* Varadic launch arguments.  End of list is marked by a zero.  */
+#define GOMP_LAUNCH_DIM		1  /* Launch dimensions, op = mask */
+#define GOMP_LAUNCH_ASYNC	2  /* Async, op = cst val if not MAX  */
+#define GOMP_LAUNCH_WAIT	3  /* Waits, op = num waits.  */
+#define GOMP_LAUNCH_CODE_SHIFT	28
+#define GOMP_LAUNCH_DEVICE_SHIFT 16
+#define GOMP_LAUNCH_OP_SHIFT 0
+#define GOMP_LAUNCH_PACK(CODE,DEVICE,OP)	\
+  (((CODE) << GOMP_LAUNCH_CODE_SHIFT)		\
+   | ((DEVICE) << GOMP_LAUNCH_DEVICE_SHIFT)	\
+   | ((OP) << GOMP_LAUNCH_OP_SHIFT))
+#define GOMP_LAUNCH_CODE(X) (((X) >> GOMP_LAUNCH_CODE_SHIFT) & 0xf)
+#define GOMP_LAUNCH_DEVICE(X) (((X) >> GOMP_LAUNCH_DEVICE_SHIFT) & 0xfff)
+#define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0xffff)
+#define GOMP_LAUNCH_OP_MAX 0xffff
+
 #endif
Index: gcc/lto/lto-lang.c
===================================================================
--- gcc/lto/lto-lang.c	(revision 228086)
+++ gcc/lto/lto-lang.c	(working copy)
@@ -160,10 +160,10 @@ enum lto_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG6) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -182,8 +182,8 @@ enum lto_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -668,13 +668,12 @@ lto_define_builtins (tree va_list_ref_ty
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6)	\
+  def_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
   def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11)	\
-  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
-	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -696,8 +695,8 @@ lto_define_builtins (tree va_list_ref_ty
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/fortran/types.def
===================================================================
--- gcc/fortran/types.def	(revision 228086)
+++ gcc/fortran/types.def	(working copy)
@@ -219,7 +219,6 @@ DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_
 			 BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
 			 BT_PTR, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_11 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+DEF_FUNCTION_TYPE_VAR_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
 			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
-			  BT_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT, BT_INT,
-			  BT_INT, BT_INT)
+			  BT_PTR, BT_PTR, BT_PTR)
Index: gcc/fortran/f95-lang.c
===================================================================
--- gcc/fortran/f95-lang.c	(revision 228086)
+++ gcc/fortran/f95-lang.c	(working copy)
@@ -635,10 +635,10 @@ gfc_init_builtin_functions (void)
 			    ARG6, ARG7, ARG8) NAME,
 #define DEF_FUNCTION_TYPE_VAR_0(NAME, RETURN) NAME,
 #define DEF_FUNCTION_TYPE_VAR_2(NAME, RETURN, ARG1, ARG2) NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				 ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -653,8 +653,8 @@ gfc_init_builtin_functions (void)
 #undef DEF_FUNCTION_TYPE_8
 #undef DEF_FUNCTION_TYPE_VAR_0
 #undef DEF_FUNCTION_TYPE_VAR_2
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
     BT_LAST
   };
@@ -1096,8 +1096,8 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG1],     	\
 					builtin_types[(int) ARG2],     	\
 					NULL_TREE);
-#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				ARG6, ARG7)				\
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6)	\
   builtin_types[(int) ENUM]						\
     = build_varargs_function_type_list (builtin_types[(int) RETURN],   	\
 					builtin_types[(int) ARG1],     	\
@@ -1106,10 +1106,9 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG4],	\
 					builtin_types[(int) ARG5],	\
 					builtin_types[(int) ARG6],	\
-					builtin_types[(int) ARG7],	\
 					NULL_TREE);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11)	\
+#define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6, ARG7)				\
   builtin_types[(int) ENUM]						\
     = build_varargs_function_type_list (builtin_types[(int) RETURN],   	\
 					builtin_types[(int) ARG1],     	\
@@ -1119,10 +1118,6 @@ gfc_init_builtin_functions (void)
 					builtin_types[(int) ARG5],	\
 					builtin_types[(int) ARG6],	\
 					builtin_types[(int) ARG7],	\
-					builtin_types[(int) ARG8],	\
-					builtin_types[(int) ARG9],	\
-					builtin_types[(int) ARG10],	\
-					builtin_types[(int) ARG11],	\
 					NULL_TREE);
 #define DEF_POINTER_TYPE(ENUM, TYPE)			\
   builtin_types[(int) ENUM]				\
@@ -1140,8 +1135,8 @@ gfc_init_builtin_functions (void)
 #undef DEF_FUNCTION_TYPE_8
 #undef DEF_FUNCTION_TYPE_VAR_0
 #undef DEF_FUNCTION_TYPE_VAR_2
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 228086)
+++ gcc/tree.h	(working copy)
@@ -1,3 +1,4 @@
+
 /* Definitions for the ubiquitous 'tree' type for GNU compilers.
    Copyright (C) 1989-2015 Free Software Foundation, Inc.
 
@@ -1369,6 +1370,8 @@ extern void protected_set_expr_location
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE), 0)
 
 /* OpenACC clause expressions  */
+#define OMP_CLAUSE_EXPR(NODE, CLAUSE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, CLAUSE), 0)
 #define OMP_CLAUSE_GANG_EXPR(NODE) \
   OMP_CLAUSE_OPERAND ( \
     OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_GANG), 0)
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 228086)
+++ gcc/c-family/c-common.c	(working copy)
@@ -5548,10 +5548,10 @@ enum c_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -5570,8 +5570,8 @@ enum c_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -5664,13 +5664,12 @@ c_define_builtins (tree va_list_ref_type
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) \
+  def_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
   def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
-  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,      \
-	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -5692,8 +5691,8 @@ c_define_builtins (tree va_list_ref_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 
Index: gcc/omp-builtins.def
===================================================================
--- gcc/omp-builtins.def	(revision 228086)
+++ gcc/omp-builtins.def	(working copy)
@@ -38,8 +38,8 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_E
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel",
-		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel_keyed",
+		   BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
 		   ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
Index: gcc/builtin-types.def
===================================================================
--- gcc/builtin-types.def	(revision 228086)
+++ gcc/builtin-types.def	(working copy)
@@ -590,15 +590,14 @@ DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_STRIN
 DEF_FUNCTION_TYPE_VAR_5 (BT_FN_INT_INT_INT_INT_INT_INT_VAR,
 			 BT_INT, BT_INT, BT_INT, BT_INT, BT_INT, BT_INT)
 
+DEF_FUNCTION_TYPE_VAR_6 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
+			 BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
+			 BT_PTR, BT_PTR, BT_PTR)
+
 DEF_FUNCTION_TYPE_VAR_7 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR,
 			 BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR,
 			 BT_PTR, BT_INT, BT_INT)
 
-DEF_FUNCTION_TYPE_VAR_11 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR,
-			  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE,
-			  BT_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT, BT_INT,
-			  BT_INT, BT_INT)
-
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_VAR, BT_FN_VOID_VAR)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_FN_VOID_VAR_PTR_SIZE,
 		     BT_PTR, BT_PTR_FN_VOID_VAR, BT_PTR, BT_SIZE)
Index: gcc/config/nvptx/nvptx.c
===================================================================
--- gcc/config/nvptx/nvptx.c	(revision 228086)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -56,6 +56,8 @@
 #include "cfgrtl.h"
 #include "stor-layout.h"
 #include "builtins.h"
+#include "omp-low.h"
+#include "gomp-constants.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2066,9 +2068,51 @@ nvptx_vector_alignment (const_tree type)
 static void
 nvptx_record_offload_symbol (tree decl)
 {
-  fprintf (asm_out_file, "//:%s_MAP %s\n",
-	   TREE_CODE (decl) == VAR_DECL ? "VAR" : "FUNC",
-	   IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+  switch (TREE_CODE (decl))
+    {
+    case VAR_DECL:
+      fprintf (asm_out_file, "//:VAR_MAP \"%s\"\n",
+	       IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+      break;
+
+    case FUNCTION_DECL:
+      {
+	tree attr = get_oacc_fn_attrib (decl);
+	tree dims = NULL_TREE;
+	unsigned ix;
+
+	if (attr)
+	  dims = TREE_VALUE (attr);
+	fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
+		 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+
+	for (ix = 0; ix != GOMP_DIM_MAX; ix++)
+	  {
+	    int size = 1;
+
+	    /* TODO: This check can go away once the dimension default
+	       machinery is merged to trunk.  */
+	    if (dims)
+	      {
+		tree dim = TREE_VALUE (dims);
+
+		if (dim)
+		  size = TREE_INT_CST_LOW (dim);
+
+		gcc_assert (!TREE_PURPOSE (dims));
+		dims = TREE_CHAIN (dims);
+	      }
+	    
+	    fprintf (asm_out_file, ", %#x", size);
+	  }
+	
+	fprintf (asm_out_file, "\n");
+      }
+      break;
+  
+    default:
+      gcc_unreachable ();
+    }
 }
 
 /* Implement TARGET_ASM_FILE_START.  Write the kinds of things ptxas expects
Index: gcc/config/nvptx/mkoffload.c
===================================================================
--- gcc/config/nvptx/mkoffload.c	(revision 228086)
+++ gcc/config/nvptx/mkoffload.c	(working copy)
@@ -842,6 +842,8 @@ process (FILE *in, FILE *out)
 {
   const char *input = read_file (in);
   Token *tok = tokenize (input);
+  const char *comma;
+  id_map const *id;
 
   do
     tok = parse_file (tok);
@@ -853,21 +855,25 @@ process (FILE *in, FILE *out)
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
 
-  fprintf (out, "static const char *const var_mappings[] = {\n");
-  for (id_map *id = var_ids; id; id = id->next)
-    fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
-  fprintf (out, "};\n\n");
-  fprintf (out, "static const char *const func_mappings[] = {\n");
-  for (id_map *id = func_ids; id; id = id->next)
-    fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
-  fprintf (out, "};\n\n");
+  fprintf (out, "static const char *const var_mappings[] = {");
+  for (comma = "", id = var_ids; id; comma = ",", id = id->next)
+    fprintf (out, "%s\n\t%s", comma, id->ptx_name);
+  fprintf (out, "\n};\n\n");
+
+  fprintf (out, "static const struct nvptx_fn {\n"
+	   "  const char *name;\n"
+	   "  unsigned short dim[3];\n"
+	   "} func_mappings[] = {\n");
+  for (comma = "", id = func_ids; id; comma = ",", id = id->next)
+    fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
+  fprintf (out, "\n};\n\n");
 
   fprintf (out,
 	   "static const struct nvptx_tdata {\n"
 	   "  const char *ptx_src;\n"
 	   "  const char *const *var_names;\n"
 	   "  __SIZE_TYPE__ var_num;\n"
-	   "  const char *const *fn_names;\n"
+	   "  const struct nvptx_fn *fn_names;\n"
 	   "  __SIZE_TYPE__ fn_num;\n"
 	   "} target_data = {\n"
 	   "  ptx_code,\n"
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 228086)
+++ gcc/omp-low.c	(working copy)
@@ -82,7 +82,6 @@ along with GCC; see the file COPYING3.
 #include "lto-section-names.h"
 #include "gomp-constants.h"
 
-
 /* Lowering of OMP parallel and workshare constructs proceeds in two
    phases.  The first phase scans the function looking for OMP statements
    and then for variables that must be replaced to satisfy data sharing
@@ -8869,6 +8868,110 @@ expand_omp_atomic (struct omp_region *re
 }
 
 
+/* Encode an oacc launc argument.  This matches the GOMP_LAUNCH_PACK
+   macro on gomp-constants.h.  We do not check for overflow.  */
+
+static tree
+oacc_launch_pack (unsigned code, tree device, unsigned op)
+{
+  tree res;
+  
+  res = build_int_cst (unsigned_type_node, GOMP_LAUNCH_PACK (code, 0, op));
+  if (device)
+    {
+      device = fold_build2 (LSHIFT_EXPR, unsigned_type_node,
+			    device, build_int_cst (unsigned_type_node,
+						   GOMP_LAUNCH_DEVICE_SHIFT));
+      res = fold_build2 (BIT_IOR_EXPR, unsigned_type_node, res, device);
+    }
+  return res;
+}
+
+/* Look for compute grid dimension clauses and convert to an attribute
+   attached to FN.  This permits the target-side code to (a) massage
+   the dimensions, (b) emit that data and (c) optimize.  Non-constant
+   dimensions are pushed onto ARGS.
+
+   The attribute value is a TREE_LIST.  A set of dimensions is
+   represented as a list of INTEGER_CST.  Those that are runtime
+   expres are represented as an INTEGER_CST of zero.
+
+   TOOO. Normally the attribute will just contain a single such list.  If
+   however it contains a list of lists, this will represent the use of
+   device_type.  Each member of the outer list is an assoc list of
+   dimensions, keyed by the device type.  The first entry will be the
+   default.  Well, that's the plan.  */
+
+#define OACC_FN_ATTRIB "oacc function"
+
+/* Replace any existing oacc fn attribute with updated dimensions.  */
+
+void
+replace_oacc_fn_attrib (tree fn, tree dims)
+{
+  tree ident = get_identifier (OACC_FN_ATTRIB);
+  tree attribs = DECL_ATTRIBUTES (fn);
+
+  /* If we happen to be present as the first attrib, drop it.  */
+  if (attribs && TREE_PURPOSE (attribs) == ident)
+    attribs = TREE_CHAIN (attribs);
+  DECL_ATTRIBUTES (fn) = tree_cons (ident, dims, attribs);
+}
+
+/* Scan CLAUSES for launch dimensions and attach them to the oacc
+   function attribute.  Push any that are non-constant onto the ARGS
+   list, along with an appropriate GOMP_LAUNCH_DIM tag.  */
+
+static void
+set_oacc_fn_attrib (tree fn, tree clauses, vec<tree> *args)
+{
+  /* Must match GOMP_DIM ordering.  */
+  static const omp_clause_code ids[]
+    = { OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS,
+	OMP_CLAUSE_VECTOR_LENGTH };
+  unsigned ix;
+  tree dims[GOMP_DIM_MAX];
+  tree attr = NULL_TREE;
+  unsigned non_const = 0;
+
+  for (ix = GOMP_DIM_MAX; ix--;)
+    {
+      tree clause = find_omp_clause (clauses, ids[ix]);
+      tree dim = NULL_TREE;
+
+      if (clause)
+	dim = OMP_CLAUSE_EXPR (clause, ids[ix]);
+      dims[ix] = dim;
+      if (dim && TREE_CODE (dim) != INTEGER_CST)
+	{
+	  dim = integer_zero_node;
+	  non_const |= GOMP_DIM_MASK (ix);
+	}
+      attr = tree_cons (NULL_TREE, dim, attr);
+    }
+
+  replace_oacc_fn_attrib (fn, attr);
+
+  if (non_const)
+    {
+      /* Push a dynamic argument set.  */
+      args->safe_push (oacc_launch_pack (GOMP_LAUNCH_DIM,
+					 NULL_TREE, non_const));
+      for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
+	if (non_const & GOMP_DIM_MASK (ix))
+	  args->safe_push (dims[ix]);
+    }
+}
+
+/* Retrieve the oacc function attrib and return it.  Non-oacc
+   functions will return NULL.  */
+
+tree
+get_oacc_fn_attrib (tree fn)
+{
+  return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -8889,10 +8992,10 @@ expand_omp_target (struct omp_region *re
   offloaded = is_gimple_omp_offloaded (entry_stmt);
   switch (gimple_omp_target_kind (entry_stmt))
     {
-    case GF_OMP_TARGET_KIND_REGION:
-    case GF_OMP_TARGET_KIND_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
+    case GF_OMP_TARGET_KIND_REGION:
+    case GF_OMP_TARGET_KIND_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
       data_region = false;
@@ -9224,6 +9327,7 @@ expand_omp_target (struct omp_region *re
     }
 
   gimple *g;
+  bool tagging = false;
   /* The maximum number used by any start_ix, without varargs.  */
   auto_vec<tree, 11> args;
   args.quick_push (device);
@@ -9259,88 +9363,87 @@ expand_omp_target (struct omp_region *re
       break;
     case BUILT_IN_GOACC_PARALLEL:
       {
-	tree t_num_gangs, t_num_workers, t_vector_length;
-
-	/* Default values for num_gangs, num_workers, and vector_length.  */
-	t_num_gangs = t_num_workers = t_vector_length
-	  = fold_convert_loc (gimple_location (entry_stmt),
-			      integer_type_node, integer_one_node);
-	/* ..., but if present, use the value specified by the respective
-	   clause, making sure that are of the correct type.  */
-	c = find_omp_clause (clauses, OMP_CLAUSE_NUM_GANGS);
-	if (c)
-	  t_num_gangs = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					  integer_type_node,
-					  OMP_CLAUSE_NUM_GANGS_EXPR (c));
-	c = find_omp_clause (clauses, OMP_CLAUSE_NUM_WORKERS);
-	if (c)
-	  t_num_workers = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					    integer_type_node,
-					    OMP_CLAUSE_NUM_WORKERS_EXPR (c));
-	c = find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH);
-	if (c)
-	  t_vector_length = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-					      integer_type_node,
-					      OMP_CLAUSE_VECTOR_LENGTH_EXPR (c));
-	args.quick_push (t_num_gangs);
-	args.quick_push (t_num_workers);
-	args.quick_push (t_vector_length);
+	set_oacc_fn_attrib (child_fn, clauses, &args);
+	tagging = true;
       }
       /* FALLTHRU */
     case BUILT_IN_GOACC_ENTER_EXIT_DATA:
     case BUILT_IN_GOACC_UPDATE:
       {
-	tree t_async;
-	int t_wait_idx;
+	tree t_async = NULL_TREE;
 
-	/* Default values for t_async.  */
-	t_async = fold_convert_loc (gimple_location (entry_stmt),
-				    integer_type_node,
-				    build_int_cst (integer_type_node,
-						   GOMP_ASYNC_SYNC));
-	/* ..., but if present, use the value specified by the respective
+	/* If present, use the value specified by the respective
 	   clause, making sure that is of the correct type.  */
 	c = find_omp_clause (clauses, OMP_CLAUSE_ASYNC);
 	if (c)
 	  t_async = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
 				      integer_type_node,
 				      OMP_CLAUSE_ASYNC_EXPR (c));
-
-	args.quick_push (t_async);
-	/* Save the index, and... */
-	t_wait_idx = args.length ();
-	/* ... push a default value.  */
-	args.quick_push (fold_convert_loc (gimple_location (entry_stmt),
-					   integer_type_node,
-					   integer_zero_node));
-	c = find_omp_clause (clauses, OMP_CLAUSE_WAIT);
-	if (c)
+	else if (!tagging)
+	  /* Default values for t_async.  */
+	  t_async = fold_convert_loc (gimple_location (entry_stmt),
+				      integer_type_node,
+				      build_int_cst (integer_type_node,
+						     GOMP_ASYNC_SYNC));
+	if (tagging && t_async)
 	  {
-	    int n = 0;
+	    unsigned HOST_WIDE_INT i_async;
 
-	    for (; c; c = OMP_CLAUSE_CHAIN (c))
+	    if (TREE_CODE (t_async) == INTEGER_CST)
 	      {
-		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
-		  {
-		    args.safe_push (fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-						      integer_type_node,
-						      OMP_CLAUSE_WAIT_EXPR (c)));
-		    n++;
-		  }
+		/* See if we can pack the async arg in to the tag's
+		   operand.  */
+		i_async = TREE_INT_CST_LOW (t_async);
+
+		if (i_async < GOMP_LAUNCH_OP_MAX)
+		  t_async = NULL_TREE;
 	      }
+	    if (t_async)
+	      i_async = GOMP_LAUNCH_OP_MAX;
+	    args.safe_push (oacc_launch_pack
+			    (GOMP_LAUNCH_ASYNC, NULL_TREE, i_async));
+	  }
+	if (t_async)
+	  args.safe_push (t_async);
 
-	    /* Now that we know the number, replace the default value.  */
-	    args.ordered_remove (t_wait_idx);
-	    args.quick_insert (t_wait_idx,
-			       fold_convert_loc (gimple_location (entry_stmt),
-						 integer_type_node,
-						 build_int_cst (integer_type_node, n)));
+	/* Save the argument index, and ... */
+	unsigned t_wait_idx = args.length ();
+	unsigned num_waits = 0;
+	c = find_omp_clause (clauses, OMP_CLAUSE_WAIT);
+	if (!tagging || c)
+	  /* ... push a placeholder.  */
+	  args.safe_push (integer_zero_node);
+
+	for (; c; c = OMP_CLAUSE_CHAIN (c))
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
+	    {
+	      args.safe_push (fold_convert_loc (OMP_CLAUSE_LOCATION (c),
+						integer_type_node,
+						OMP_CLAUSE_WAIT_EXPR (c)));
+	      num_waits++;
+	    }
+
+	if (!tagging || num_waits)
+	  {
+	    tree len;
+
+	    /* Now that we know the number, update the placeholder.  */
+	    if (tagging)
+	      len = oacc_launch_pack (GOMP_LAUNCH_WAIT, NULL_TREE, num_waits);
+	    else
+	      len = build_int_cst (integer_type_node, num_waits);
+	    len = fold_convert_loc (gimple_location (entry_stmt),
+				    unsigned_type_node, len);
+	    args[t_wait_idx] = len;
 	  }
       }
       break;
     default:
       gcc_unreachable ();
     }
+  if (tagging)
+    /*  Push terminal marker - zero.  */
+    args.safe_push (oacc_launch_pack (0, NULL_TREE, 0));
 
   g = gimple_build_call_vec (builtin_decl_explicit (start_ix), args);
   gimple_set_location (g, gimple_location (entry_stmt));
Index: gcc/omp-low.h
===================================================================
--- gcc/omp-low.h	(revision 228086)
+++ gcc/omp-low.h	(working copy)
@@ -29,6 +29,7 @@ extern tree omp_reduction_init_op (locat
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern tree get_oacc_fn_attrib (tree);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
Index: libgomp/libgomp_g.h
===================================================================
--- libgomp/libgomp_g.h	(revision 228086)
+++ libgomp/libgomp_g.h	(working copy)
@@ -222,9 +222,8 @@ extern void GOACC_data_start (int, size_
 extern void GOACC_data_end (void);
 extern void GOACC_enter_exit_data (int, size_t, void **,
 				   size_t *, unsigned short *, int, int, ...);
-extern void GOACC_parallel (int, void (*) (void *), size_t,
-			    void **, size_t *, unsigned short *, int, int, int,
-			    int, int, ...);
+extern void GOACC_parallel_keyd (int, void (*) (void *), size_t,
+			          void **, size_t *, unsigned short *, ...);
 extern void GOACC_update (int, size_t, void **, size_t *,
 			  unsigned short *, int, int, ...);
 extern void GOACC_wait (int, int, ...);
Index: libgomp/plugin/plugin-nvptx.c
===================================================================
--- libgomp/plugin/plugin-nvptx.c	(revision 228086)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -282,12 +282,20 @@ map_push (struct ptx_stream *s, int asyn
   return;
 }
 
+/* Target data function launch information.  */
+
+struct targ_fn_launch
+{
+  const char *fn;
+  unsigned short dim[3];
+};
+
 /* Descriptor of a loaded function.  */
 
 struct targ_fn_descriptor
 {
   CUfunction fn;
-  const char *name;
+  const struct targ_fn_launch *launch;
 };
 
 /* A loaded PTX image.  */
@@ -929,8 +937,8 @@ event_add (enum ptx_event_type type, CUe
 
 void
 nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
-	  size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
-	  int vector_length, int async, void *targ_mem_desc)
+	    size_t *sizes, unsigned short *kinds, int async, unsigned *dims,
+	    void *targ_mem_desc)
 {
   struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn;
   CUfunction function;
@@ -939,7 +947,6 @@ nvptx_exec (void (*fn), size_t mapnum, v
   struct ptx_stream *dev_str;
   void *kargs[1];
   void *hp, *dp;
-  unsigned int nthreads_in_block;
   struct nvptx_thread *nvthd = nvptx_thread ();
   const char *maybe_abort_msg = "(perhaps abort was called)";
 
@@ -948,6 +955,20 @@ nvptx_exec (void (*fn), size_t mapnum, v
   dev_str = select_stream_for_async (async, pthread_self (), false, NULL);
   assert (dev_str == nvthd->current_stream);
 
+  /* Initialize the launch dimensions.  Typically this is constant,
+     provided by the device compiler, but we must permit runtime
+     values.  */
+  for (i = 0; i != 3; i++)
+    if (targ_fn->launch->dim[i])
+      dims[i] = targ_fn->launch->dim[i];
+
+  if (dims[GOMP_DIM_GANG] != 1)
+    GOMP_PLUGIN_fatal ("non-unity num_gangs (%d) not supported",
+		       dims[GOMP_DIM_GANG]);
+  if (dims[GOMP_DIM_WORKER] != 1)
+    GOMP_PLUGIN_fatal ("non-unity num_workers (%d) not supported",
+		       dims[GOMP_DIM_WORKER]);
+
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
      the host and the device. HP is a host pointer to the new chunk, and DP is
      the corresponding device pointer.  */
@@ -965,35 +986,21 @@ nvptx_exec (void (*fn), size_t mapnum, v
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuMemcpy failed: %s", cuda_error (r));
 
-  GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch\n", __FUNCTION__, targ_fn->name);
+  GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch"
+		     " gangs=%u, workers=%u, vectors=%u\n",
+		     __FUNCTION__, targ_fn->launch->fn,
+		     dims[0], dims[1], dims[2]);
 
   // OpenACC		CUDA
   //
-  // num_gangs		blocks
-  // num_workers	warps (where a warp is equivalent to 32 threads)
-  // vector length	threads
-  //
-
-  /* The openacc vector_length clause 'determines the vector length to use for
-     vector or SIMD operations'.  The question is how to map this to CUDA.
-
-     In CUDA, the warp size is the vector length of a CUDA device.  However, the
-     CUDA interface abstracts away from that, and only shows us warp size
-     indirectly in maximum number of threads per block, which is a product of
-     warp size and the number of hyperthreads of a multiprocessor.
-
-     We choose to map openacc vector_length directly onto the number of threads
-     in a block, in the x dimension.  This is reflected in gcc code generation
-     that uses ThreadIdx.x to access vector elements.
-
-     Attempting to use an openacc vector_length of more than the maximum number
-     of threads per block will result in a cuda error.  */
-  nthreads_in_block = vector_length;
+  // num_gangs		nctaid.x
+  // num_workers	ntid.y
+  // vector length	ntid.x
 
   kargs[0] = &dp;
   r = cuLaunchKernel (function,
-		      num_gangs, 1, 1,
-		      nthreads_in_block, 1, 1,
+		      dims[GOMP_DIM_GANG], 1, 1,
+		      dims[GOMP_DIM_VECTOR], dims[GOMP_DIM_WORKER], 1,
 		      0, dev_str->stream, kargs, 0);
   if (r != CUDA_SUCCESS)
     GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
@@ -1039,7 +1046,7 @@ nvptx_exec (void (*fn), size_t mapnum, v
 #endif
 
   GOMP_PLUGIN_debug (0, "  %s: kernel %s: finished\n", __FUNCTION__,
-		     targ_fn->name);
+		     targ_fn->launch->fn);
 
 #ifndef DISABLE_ASYNC
   if (async < acc_async_noval)
@@ -1567,7 +1574,7 @@ typedef struct nvptx_tdata
   const char *const *var_names;
   size_t var_num;
 
-  const char *const *fn_names;
+  const struct targ_fn_launch *fn_descs;
   size_t fn_num;
 } nvptx_tdata_t;
 
@@ -1588,7 +1595,8 @@ GOMP_OFFLOAD_load_image (int ord, unsign
 			 struct addr_pair **target_table)
 {
   CUmodule module;
-  const char *const *fn_names, *const *var_names;
+  const char *const *var_names;
+  const struct targ_fn_launch *fn_descs;
   unsigned int fn_entries, var_entries, i, j;
   CUresult r;
   struct targ_fn_descriptor *targ_fns;
@@ -1617,7 +1625,7 @@ GOMP_OFFLOAD_load_image (int ord, unsign
   var_entries = img_header->var_num;
   var_names = img_header->var_names;
   fn_entries = img_header->fn_num;
-  fn_names = img_header->fn_names;
+  fn_descs = img_header->fn_descs;
 
   targ_tbl = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
 				 * (fn_entries + var_entries));
@@ -1640,12 +1648,12 @@ GOMP_OFFLOAD_load_image (int ord, unsign
     {
       CUfunction function;
 
-      r = cuModuleGetFunction (&function, module, fn_names[i]);
+      r = cuModuleGetFunction (&function, module, fn_descs[i].fn);
       if (r != CUDA_SUCCESS)
 	GOMP_PLUGIN_fatal ("cuModuleGetFunction error: %s", cuda_error (r));
 
       targ_fns->fn = function;
-      targ_fns->name = (const char *) fn_names[i];
+      targ_fns->launch = &fn_descs[i];
 
       targ_tbl->start = (uintptr_t) targ_fns;
       targ_tbl->end = targ_tbl->start + 1;
@@ -1724,13 +1732,12 @@ void (*device_run) (int n, void *fn_ptr,
 
 void
 GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *), size_t mapnum,
-			       void **hostaddrs, void **devaddrs, size_t *sizes,
-			       unsigned short *kinds, int num_gangs,
-			       int num_workers, int vector_length, int async,
-			       void *targ_mem_desc)
+			       void **hostaddrs, void **devaddrs,
+			       size_t *sizes, unsigned short *kinds,
+			       int async, unsigned *dims, void *targ_mem_desc)
 {
-  nvptx_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs,
-	    num_workers, vector_length, async, targ_mem_desc);
+  nvptx_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
+	      async, dims, targ_mem_desc);
 }
 
 void
Index: libgomp/libgomp.map
===================================================================
--- libgomp/libgomp.map	(revision 228086)
+++ libgomp/libgomp.map	(working copy)
@@ -332,6 +332,11 @@ GOACC_2.0 {
 	GOACC_get_num_threads;
 };
 
+GOACC_2.0.1 {
+  global:
+	GOACC_parallel_keyed;
+} GOACC_2.0;
+
 GOMP_PLUGIN_1.0 {
   global:
 	GOMP_PLUGIN_malloc;
Index: libgomp/libgomp.h
===================================================================
--- libgomp/libgomp.h	(revision 228086)
+++ libgomp/libgomp.h	(working copy)
@@ -695,7 +695,7 @@ typedef struct acc_dispatch_t
 
   /* Execute.  */
   void (*exec_func) (void (*) (void *), size_t, void **, void **, size_t *,
-		     unsigned short *, int, int, int, int, void *);
+		     unsigned short *, int, unsigned *, void *);
 
   /* Async cleanup callback registration.  */
   void (*register_async_cleanup_func) (void *);
Index: libgomp/oacc-host.c
===================================================================
--- libgomp/oacc-host.c	(revision 228086)
+++ libgomp/oacc-host.c	(working copy)
@@ -137,10 +137,8 @@ host_openacc_exec (void (*fn) (void *),
 		   void **devaddrs __attribute__ ((unused)),
 		   size_t *sizes __attribute__ ((unused)),
 		   unsigned short *kinds __attribute__ ((unused)),
-		   int num_gangs __attribute__ ((unused)),
-		   int num_workers __attribute__ ((unused)),
-		   int vector_length __attribute__ ((unused)),
 		   int async __attribute__ ((unused)),
+		   unsigned *dims __attribute ((unused)),
 		   void *targ_mem_desc __attribute__ ((unused)))
 {
   fn (hostaddrs);
Index: libgomp/oacc-parallel.c
===================================================================
--- libgomp/oacc-parallel.c	(revision 228086)
+++ libgomp/oacc-parallel.c	(working copy)
@@ -49,14 +49,18 @@ find_pset (int pos, size_t mapnum, unsig
   return kind == GOMP_MAP_TO_PSET;
 }
 
-static void goacc_wait (int async, int num_waits, va_list ap);
+static void goacc_wait (int async, int num_waits, va_list *ap);
+
+
+/* Launch a possibly offloaded function on DEVICE.  FN is the host fn
+   address.  MAPNUM, HOSTADDRS, SIZES & KINDS  describe the memory
+   blocks to be copied to/from the device.  Varadic arguments are
+   keyed optional parameters terminated with a zero.  */
 
 void
-GOACC_parallel (int device, void (*fn) (void *),
-		size_t mapnum, void **hostaddrs, size_t *sizes,
-		unsigned short *kinds,
-		int num_gangs, int num_workers, int vector_length,
-		int async, int num_waits, ...)
+GOACC_parallel_keyed (int device, void (*fn) (void *),
+		      size_t mapnum, void **hostaddrs, size_t *sizes,
+		      unsigned short *kinds, ...)
 {
   bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
   va_list ap;
@@ -68,22 +72,16 @@ GOACC_parallel (int device, void (*fn) (
   struct splay_tree_key_s k;
   splay_tree_key tgt_fn_key;
   void (*tgt_fn);
-
-  if (num_gangs != 1)
-    gomp_fatal ("num_gangs (%d) different from one is not yet supported",
-		num_gangs);
-  if (num_workers != 1)
-    gomp_fatal ("num_workers (%d) different from one is not yet supported",
-		num_workers);
+  int async = GOMP_ASYNC_SYNC;
+  unsigned dims[GOMP_DIM_MAX];
+  unsigned tag;
 
 #ifdef HAVE_INTTYPES_H
-  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p, "
-		 "async = %d\n",
-	      __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds, async);
+  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p\n",
+	      __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds);
 #else
-  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
-	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds,
-	      async);
+  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p\n",
+	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds);
 #endif
   goacc_lazy_initialize ();
 
@@ -105,12 +103,51 @@ GOACC_parallel (int device, void (*fn) (
       return;
     }
 
-  if (num_waits)
-    {
-      va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
-      va_end (ap);
+  va_start (ap, kinds);
+  /* TODO: This will need amending when device_type is implemented.  */
+  while ((tag = va_arg (ap, unsigned)) != 0)
+    {
+      if (GOMP_LAUNCH_DEVICE (tag))
+	gomp_fatal ("device_type '%d' offload parameters, libgomp is too old",
+		    GOMP_LAUNCH_DEVICE (tag));
+
+      switch (GOMP_LAUNCH_CODE (tag))
+	{
+	case GOMP_LAUNCH_DIM:
+	  {
+	    unsigned mask = GOMP_LAUNCH_OP (tag);
+
+	    for (i = 0; i != GOMP_DIM_MAX; i++)
+	      if (mask & GOMP_DIM_MASK (i))
+		dims[i] = va_arg (ap, unsigned);
+	  }
+	  break;
+
+	case GOMP_LAUNCH_ASYNC:
+	  {
+	    /* Small constant values are encoded in the operand.  */
+	    async = GOMP_LAUNCH_OP (tag);
+
+	    if (async == GOMP_LAUNCH_OP_MAX)
+	      async = va_arg (ap, unsigned);
+	    break;
+	  }
+
+	case GOMP_LAUNCH_WAIT:
+	  {
+	    unsigned num_waits = GOMP_LAUNCH_OP (tag);
+
+	    if (num_waits)
+	      goacc_wait (async, num_waits, &ap);
+	    break;
+	  }
+
+	default:
+	  gomp_fatal ("unrecognized offload code '%d',"
+		      " libgomp is too old", GOMP_LAUNCH_CODE (tag));
+	}
     }
+  va_end (ap);
   
   acc_dev->openacc.async_set_async_func (async);
 
@@ -138,9 +175,8 @@ GOACC_parallel (int device, void (*fn) (
     devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
 			    + tgt->list[i]->tgt_offset);
 
-  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes, kinds,
-			      num_gangs, num_workers, vector_length, async,
-			      tgt);
+  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, sizes,
+			      kinds, async, dims, tgt);
 
   /* If running synchronously, unmap immediately.  */
   if (async < acc_async_noval)
@@ -154,6 +190,20 @@ GOACC_parallel (int device, void (*fn) (
   acc_dev->openacc.async_set_async_func (acc_async_sync);
 }
 
+/* Legacy entry point, only provide host execution.  */
+
+void
+GOACC_parallel (int device, void (*fn) (void *),
+		size_t mapnum, void **hostaddrs, size_t *sizes,
+		unsigned short *kinds,
+		int num_gangs, int num_workers, int vector_length,
+		int async, int num_waits, ...)
+{
+  goacc_save_and_set_bind (acc_device_host);
+  fn (hostaddrs);
+  goacc_restore_bind ();
+}
+
 void
 GOACC_data_start (int device, size_t mapnum,
 		  void **hostaddrs, size_t *sizes, unsigned short *kinds)
@@ -230,7 +280,7 @@ GOACC_enter_exit_data (int device, size_
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
 
@@ -344,15 +394,15 @@ GOACC_enter_exit_data (int device, size_
 }
 
 static void
-goacc_wait (int async, int num_waits, va_list ap)
+goacc_wait (int async, int num_waits, va_list *ap)
 {
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
   while (num_waits--)
     {
-      int qid = va_arg (ap, int);
-
+      int qid = va_arg (*ap, int);
+      
       if (acc_async_test (qid))
 	continue;
 
@@ -389,7 +439,7 @@ GOACC_update (int device, size_t mapnum,
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
 
@@ -430,7 +480,7 @@ GOACC_wait (int async, int num_waits, ..
       va_list ap;
 
       va_start (ap, num_waits);
-      goacc_wait (async, num_waits, ap);
+      goacc_wait (async, num_waits, &ap);
       va_end (ap);
     }
   else if (async == acc_async_sync)

[-- Attachment #3: vries-ada.patch --]
[-- Type: text/x-patch, Size: 1944 bytes --]

2015-09-28  Tom de Vries  <tom@codesourcery.com>

	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_6): Define

Index: ada/gcc-interface/utils.c
===================================================================
--- ada/gcc-interface/utils.c	(revision 228219)
+++ ada/gcc-interface/utils.c	(working copy)
@@ -5376,6 +5376,8 @@ enum c_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
 #define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
@@ -5398,6 +5400,7 @@ enum c_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
@@ -5505,6 +5508,9 @@ install_builtin_function_types (void)
   def_fn_type (ENUM, RETURN, 1, 4, ARG1, ARG2, ARG3, ARG4);
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   def_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6)				\
+  def_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
   def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
@@ -5533,6 +5539,7 @@ install_builtin_function_types (void)
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-08-25 13:40 Openacc launch API Nathan Sidwell
                   ` (2 preceding siblings ...)
  2015-09-17  9:46 ` Bernd Schmidt
@ 2015-09-30 12:42 ` Matthias Klose
  2015-09-30 12:45   ` Bernd Schmidt
  2015-09-30 13:05   ` Nathan Sidwell
  3 siblings, 2 replies; 34+ messages in thread
From: Matthias Klose @ 2015-09-30 12:42 UTC (permalink / raw)
  To: Nathan Sidwell, Jakub Jelinek; +Cc: GCC Patches, David Malcolm

[-- Attachment #1: Type: text/plain, Size: 504 bytes --]

On 25.08.2015 15:29, Nathan Sidwell wrote:
> Jakub,
>
> This patch changes the launch API for openacc parallels.

this broke the jit build.

The following patch fixes the build for me. Ok to commit?

   Matthias

2015-09-30  Matthias Klose  <doko@ubuntu.com>

         * jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
         remove DEF_FUNCTION_TYPE_VAR_11.
         * jit-builtins.c (builtins_manager::make_type): Define and handle
         DEF_FUNCTION_TYPE_VAR_6, remove DEF_FUNCTION_TYPE_VAR_11.



[-- Attachment #2: jit.diff --]
[-- Type: text/plain, Size: 2456 bytes --]

Index: gcc/jit/jit-builtins.c
===================================================================
--- gcc/jit/jit-builtins.c	(revision 228287)
+++ gcc/jit/jit-builtins.c	(working copy)
@@ -320,15 +320,14 @@
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
       case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, \
 				      ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6)					\
+      case ENUM: return make_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
       case ENUM: return make_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, \
 				      ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
-      case ENUM: return make_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, \
-				      ARG4, ARG5, ARG6, ARG7, ARG8, ARG9, \
-				      ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
       case ENUM: return make_ptr_type (ENUM, TYPE);
 
@@ -350,8 +349,8 @@
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
 
     default:
Index: gcc/jit/jit-builtins.h
===================================================================
--- gcc/jit/jit-builtins.h	(revision 228287)
+++ gcc/jit/jit-builtins.h	(working copy)
@@ -50,10 +50,10 @@
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -73,7 +73,6 @@
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 }; /* enum jit_builtin_type */

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-30 12:42 ` Matthias Klose
@ 2015-09-30 12:45   ` Bernd Schmidt
  2015-09-30 12:48     ` Matthias Klose
  2015-09-30 13:05   ` Nathan Sidwell
  1 sibling, 1 reply; 34+ messages in thread
From: Bernd Schmidt @ 2015-09-30 12:45 UTC (permalink / raw)
  To: Matthias Klose, Nathan Sidwell, Jakub Jelinek; +Cc: GCC Patches, David Malcolm

On 09/30/2015 02:37 PM, Matthias Klose wrote:
>
> this broke the jit build.
>
> The following patch fixes the build for me. Ok to commit?
>
>    Matthias
>
> 2015-09-30  Matthias Klose  <doko@ubuntu.com>
>
>          * jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
>          remove DEF_FUNCTION_TYPE_VAR_11.
>          * jit-builtins.c (builtins_manager::make_type): Define and handle
>          DEF_FUNCTION_TYPE_VAR_6, remove DEF_FUNCTION_TYPE_VAR_11.

Yeah, I think that qualifies as obvious.


Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-30 12:45   ` Bernd Schmidt
@ 2015-09-30 12:48     ` Matthias Klose
  0 siblings, 0 replies; 34+ messages in thread
From: Matthias Klose @ 2015-09-30 12:48 UTC (permalink / raw)
  To: Bernd Schmidt, Matthias Klose, Nathan Sidwell, Jakub Jelinek
  Cc: GCC Patches, David Malcolm

On 30.09.2015 14:40, Bernd Schmidt wrote:
> On 09/30/2015 02:37 PM, Matthias Klose wrote:
>>
>> this broke the jit build.
>>
>> The following patch fixes the build for me. Ok to commit?
>>
>>    Matthias
>>
>> 2015-09-30  Matthias Klose  <doko@ubuntu.com>
>>
>>          * jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
>>          remove DEF_FUNCTION_TYPE_VAR_11.
>>          * jit-builtins.c (builtins_manager::make_type): Define and handle
>>          DEF_FUNCTION_TYPE_VAR_6, remove DEF_FUNCTION_TYPE_VAR_11.
>
> Yeah, I think that qualifies as obvious.

Ok, committed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Openacc launch API
  2015-09-30 12:42 ` Matthias Klose
  2015-09-30 12:45   ` Bernd Schmidt
@ 2015-09-30 13:05   ` Nathan Sidwell
  1 sibling, 0 replies; 34+ messages in thread
From: Nathan Sidwell @ 2015-09-30 13:05 UTC (permalink / raw)
  To: Matthias Klose, Jakub Jelinek; +Cc: GCC Patches, David Malcolm

On 09/30/15 08:37, Matthias Klose wrote:
> On 25.08.2015 15:29, Nathan Sidwell wrote:
>> Jakub,
>>
>> This patch changes the launch API for openacc parallels.
>
> this broke the jit build.
>
> The following patch fixes the build for me. Ok to commit?
>
>    Matthias
>
> 2015-09-30  Matthias Klose  <doko@ubuntu.com>
>
>          * jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
>          remove DEF_FUNCTION_TYPE_VAR_11.
>          * jit-builtins.c (builtins_manager::make_type): Define and handle
>          DEF_FUNCTION_TYPE_VAR_6, remove DEF_FUNCTION_TYPE_VAR_11.

Looks obvious to me.  Sorry for the breakage.


nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: JIT breakage after last builtin-types change
  2015-09-30 13:27 JIT breakage after last builtin-types change Ulrich Drepper
@ 2015-09-30 13:27 ` Jakub Jelinek
  2015-09-30 14:48   ` Ulrich Drepper
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Jelinek @ 2015-09-30 13:27 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: gcc-patches

On Wed, Sep 30, 2015 at 09:05:45AM -0400, Ulrich Drepper wrote:
> After some recent additions to builtin-types.def the jit user of the
> definitions hasn't been updated.  OK to apply?
> 
> 
> 2015-09-30  Ulrich Drepper  <drepper@gmail.com>
> 
> 	* jit-builtins.c: Provide definition of DEF_FUNCTION_TYPE_VAR_6.
> 	* jit-builtins.h: Likewise.

https://gcc.gnu.org/viewcvs?rev=228289&root=gcc&view=rev should fix this
already.

	Jakub

^ permalink raw reply	[flat|nested] 34+ messages in thread

* JIT breakage after last builtin-types change
@ 2015-09-30 13:27 Ulrich Drepper
  2015-09-30 13:27 ` Jakub Jelinek
  0 siblings, 1 reply; 34+ messages in thread
From: Ulrich Drepper @ 2015-09-30 13:27 UTC (permalink / raw)
  To: gcc-patches

After some recent additions to builtin-types.def the jit user of the
definitions hasn't been updated.  OK to apply?


2015-09-30  Ulrich Drepper  <drepper@gmail.com>

	* jit-builtins.c: Provide definition of DEF_FUNCTION_TYPE_VAR_6.
	* jit-builtins.h: Likewise.


 jit-builtins.c |    5 +++++
 jit-builtins.h |    3 +++
 2 files changed, 8 insertions(+)

diff --git a/gcc/jit/jit-builtins.c b/gcc/jit/jit-builtins.c
index a29f446..8a89915 100644
--- a/gcc/jit/jit-builtins.c
+++ b/gcc/jit/jit-builtins.c
@@ -320,6 +320,10 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
       case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, \
 				      ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) \
+      case ENUM: return make_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, \
+				      ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
       case ENUM: return make_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, \
@@ -350,6 +354,7 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
diff --git a/gcc/jit/jit-builtins.h b/gcc/jit/jit-builtins.h
index fdf1323..8854326 100644
--- a/gcc/jit/jit-builtins.h
+++ b/gcc/jit/jit-builtins.h
@@ -50,6 +50,8 @@ enum jit_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
 				NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+				ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
 #define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
@@ -72,6 +74,7 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: JIT breakage after last builtin-types change
  2015-09-30 13:27 ` Jakub Jelinek
@ 2015-09-30 14:48   ` Ulrich Drepper
  2015-09-30 15:50     ` Thomas Schwinge
  0 siblings, 1 reply; 34+ messages in thread
From: Ulrich Drepper @ 2015-09-30 14:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Except that this is missing:

diff --git a/gcc/jit/jit-builtins.h b/gcc/jit/jit-builtins.h
index 0b6f974..3d76247 100644
--- a/gcc/jit/jit-builtins.h
+++ b/gcc/jit/jit-builtins.h
@@ -72,6 +72,7 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_POINTER_TYPE
   BT_LAST


On Wed, Sep 30, 2015 at 9:09 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Sep 30, 2015 at 09:05:45AM -0400, Ulrich Drepper wrote:
>> After some recent additions to builtin-types.def the jit user of the
>> definitions hasn't been updated.  OK to apply?
>>
>>
>> 2015-09-30  Ulrich Drepper  <drepper@gmail.com>
>>
>>       * jit-builtins.c: Provide definition of DEF_FUNCTION_TYPE_VAR_6.
>>       * jit-builtins.h: Likewise.
>
> https://gcc.gnu.org/viewcvs?rev=228289&root=gcc&view=rev should fix this
> already.
>
>         Jakub

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: JIT breakage after last builtin-types change
  2015-09-30 14:48   ` Ulrich Drepper
@ 2015-09-30 15:50     ` Thomas Schwinge
  0 siblings, 0 replies; 34+ messages in thread
From: Thomas Schwinge @ 2015-09-30 15:50 UTC (permalink / raw)
  To: Ulrich Drepper, GCC Patches
  Cc: Jakub Jelinek, Nathan Sidwell, Bernd Schmidt, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 3871 bytes --]

Hi!

On Wed, 30 Sep 2015 09:45:25 -0400, Ulrich Drepper <drepper@gmail.com> wrote:
> Except that this is missing: [...]

Yes, I also had already prepared such a patch; now checked in (as
obvious), include some further cleanup; r228298:

commit 5ab4aeae1dba9c2a04dd213474d8c33b6b69e2e3
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Sep 30 15:14:29 2015 +0000

    More DEF_FUNCTION_TYPE_VAR_6/DEF_FUNCTION_TYPE_VAR_11 cleanup
    
    	gcc/ada/
    	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_11): Don't define.
    	gcc/jit/
    	* jit-builtins.h: Undefine DEF_FUNCTION_TYPE_VAR_6 after use.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228298 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ada/ChangeLog             | 4 ++++
 gcc/ada/gcc-interface/utils.c | 8 --------
 gcc/jit/ChangeLog             | 5 +++++
 gcc/jit/jit-builtins.h        | 1 +
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git gcc/ada/ChangeLog gcc/ada/ChangeLog
index efb2d3a..d4c505c 100644
--- gcc/ada/ChangeLog
+++ gcc/ada/ChangeLog
@@ -1,3 +1,7 @@
+2015-09-30  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_11): Don't define.
+
 2015-09-28  Tom de Vries  <tom@codesourcery.com>
 
 	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_6): Define
diff --git gcc/ada/gcc-interface/utils.c gcc/ada/gcc-interface/utils.c
index 672b910..0f3087d 100644
--- gcc/ada/gcc-interface/utils.c
+++ gcc/ada/gcc-interface/utils.c
@@ -5380,8 +5380,6 @@ enum c_builtin_type
 				ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -5402,7 +5400,6 @@ enum c_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_5
 #undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 };
@@ -5514,10 +5511,6 @@ install_builtin_function_types (void)
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 				ARG6, ARG7)				\
   def_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-				 ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
-  def_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6,	\
-	       ARG7, ARG8, ARG9, ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   builtin_types[(int) ENUM] = build_pointer_type (builtin_types[(int) TYPE]);
 
@@ -5541,7 +5534,6 @@ install_builtin_function_types (void)
 #undef DEF_FUNCTION_TYPE_VAR_5
 #undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   builtin_types[(int) BT_LAST] = NULL_TREE;
 }
diff --git gcc/jit/ChangeLog gcc/jit/ChangeLog
index faa8545..05fa9e6 100644
--- gcc/jit/ChangeLog
+++ gcc/jit/ChangeLog
@@ -1,3 +1,8 @@
+2015-09-30  Thomas Schwinge  <thomas@codesourcery.com>
+	    Ulrich Drepper  <drepper@gmail.com>
+
+	* jit-builtins.h: Undefine DEF_FUNCTION_TYPE_VAR_6 after use.
+
 2015-09-30  Matthias Klose  <doko@ubuntu.com>
 
 	* jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
diff --git gcc/jit/jit-builtins.h gcc/jit/jit-builtins.h
index 0b6f974..3d76247 100644
--- gcc/jit/jit-builtins.h
+++ gcc/jit/jit-builtins.h
@@ -72,6 +72,7 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_POINTER_TYPE
   BT_LAST


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* libgomp external ABI prototypes maintenance (was: Openacc launch API)
  2015-09-28 21:20         ` Nathan Sidwell
@ 2016-04-12 13:17           ` Thomas Schwinge
  2016-04-20 11:35           ` libgomp: Make GCC 5 OpenACC offloading executables work " Thomas Schwinge
  1 sibling, 0 replies; 34+ messages in thread
From: Thomas Schwinge @ 2016-04-12 13:17 UTC (permalink / raw)
  To: GCC Patches; +Cc: Nathan Sidwell, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3261 bytes --]

Hi!

On Mon, 28 Sep 2015 15:38:57 -0400, Nathan Sidwell <nathan@acm.org> wrote:
> --- libgomp/libgomp_g.h	(revision 228086)
> +++ libgomp/libgomp_g.h	(working copy)
> @@ -222,9 +222,8 @@ extern void GOACC_data_start (int, size_
>  extern void GOACC_data_end (void);
>  extern void GOACC_enter_exit_data (int, size_t, void **,
>  				   size_t *, unsigned short *, int, int, ...);
> -extern void GOACC_parallel (int, void (*) (void *), size_t,
> -			    void **, size_t *, unsigned short *, int, int, int,
> -			    int, int, ...);

Even if no longer a GCC builtin, we should still keep that one: it's
still part of the libgomp ABI.

> +extern void GOACC_parallel_keyd (int, void (*) (void *), size_t,
> +			          void **, size_t *, unsigned short *, ...);

Typo: should be GOACC_parallel_keyed.

Additionally, I found GOACC_declare missing.

Committed in r234901, as obvious:

commit 1e312f40ebbad3c47d5ad5ec8375b717eed40dc7
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 12 13:13:53 2016 +0000

    libgomp external ABI prototypes maintenance
    
    	libgomp/
    	* libgomp_g.h: Rename GOACC_parallel_keyd prototype to
    	GOACC_parallel_keyed, restore GOACC_parallel prototype, new
    	GOACC_declare prototype.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@234901 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog   | 4 ++++
 libgomp/libgomp_g.h | 7 +++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 1716ba0..7628c93 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,9 @@
 2016-04-12  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* libgomp_g.h: Rename GOACC_parallel_keyd prototype to
+	GOACC_parallel_keyed, restore GOACC_parallel prototype, new
+	GOACC_declare prototype.
+
 	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c:
 	Merge this file, and...
 	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c:
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index 24eebb6..20454e6 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -295,17 +295,20 @@ extern void GOMP_teams (unsigned int, unsigned int);
 
 /* oacc-parallel.c */
 
+extern void GOACC_parallel_keyed (int, void (*) (void *), size_t,
+				  void **, size_t *, unsigned short *, ...);
+extern void GOACC_parallel (int, void (*) (void *), size_t, void **, size_t *,
+			    unsigned short *, int, int, int, int, int, ...);
 extern void GOACC_data_start (int, size_t, void **, size_t *,
 			      unsigned short *);
 extern void GOACC_data_end (void);
 extern void GOACC_enter_exit_data (int, size_t, void **,
 				   size_t *, unsigned short *, int, int, ...);
-extern void GOACC_parallel_keyd (int, void (*) (void *), size_t,
-			          void **, size_t *, unsigned short *, ...);
 extern void GOACC_update (int, size_t, void **, size_t *,
 			  unsigned short *, int, int, ...);
 extern void GOACC_wait (int, int, ...);
 extern int GOACC_get_num_threads (void);
 extern int GOACC_get_thread_num (void);
+extern void GOACC_declare (int, size_t, void **, size_t *, unsigned short *);
 
 #endif /* LIBGOMP_G_H */


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* libgomp: Make GCC 5 OpenACC offloading executables work (was: Openacc launch API)
  2015-09-28 21:20         ` Nathan Sidwell
  2016-04-12 13:17           ` libgomp external ABI prototypes maintenance (was: Openacc launch API) Thomas Schwinge
@ 2016-04-20 11:35           ` Thomas Schwinge
  2016-05-11 13:46             ` libgomp: Make GCC 5 OpenACC offloading executables work Thomas Schwinge
  1 sibling, 1 reply; 34+ messages in thread
From: Thomas Schwinge @ 2016-04-20 11:35 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches; +Cc: Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 8601 bytes --]

Hi!

On Mon, 28 Sep 2015 15:38:57 -0400, Nathan Sidwell <nathan@acm.org> wrote:
> On 09/24/15 04:40, Jakub Jelinek wrote:
> > Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
> > anyway because of the incompatible PTX version

I do agree that it's reasonable to require users to re-compile their code
when switching between major GCC releases, to retain the offloading
feature, or otherwise resort to host fallback execution.  I'll propose
some text along these lines for the GCC 6 release notes.

> > why don't you just
> > do
> >    goacc_save_and_set_bind (acc_device_host);
> >    fn (hostaddrs);
> >    goacc_restore_bind ();
> 
> Committed the  attached.  Thanks for the review.

What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
offloading executables will just run into SIGSEGV.  Here is a patch
(which depends on
<http://news.gmane.org/find-root.php?message_id=%3C87a8ko3ea0.fsf%40hertz.schwinge.homeip.net%3E>).
Unfortunately, we have to jump through some hoops: because GCC 5
compiler-generated OpenACC reductions code emits calls to
acc_get_device_type, and because we'll (have to) always resort to host
fallback execution for GCC 5 executables, we also have to enforce these
acc_get_device_type calls to return acc_device_host; otherwise reductions
will give bogus results.  (I hope I'm correctly implementing/using the
symbol versioning "magic".)  OK for gcc-6-branch and trunk?  Assuming we
want this fixed on gcc-6-branch, should it be part of 6.1 (to avoid 6.1
users running into the SIGSEGV), or delay for 6.2?

We don't have an easy way to add test cases to make sure we don't break
such legacy interfaces, do we?  (So, I just manually checked a few test
cases.)

commit c68c6b8e79176f5dc21684efe2517cbfb83a182e
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Wed Apr 20 13:08:57 2016 +0200

    libgomp: Make GCC 5 OpenACC offloading executables work
    
    	* libgomp.h: Include "openacc.h".
    	(goacc_get_device_type_201, goacc_get_device_type_20): New
    	prototypes.
    	(oacc_20_201_symver, goacc_get_device_type_201): New macros.
    	* libgomp.map: Add acc_get_device_type with OACC_2.0.1 symbol
    	version.
    	* oacc-init.c (acc_get_device_type): Rename to
    	goacc_get_device_type_201.
    	(goacc_get_device_type_20): New function.
    	* oacc-parallel.c (GOACC_parallel): Call goacc_lazy_initialize.
    	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Refuse version
    	0 offload images.
    	* target.c (gomp_load_image_to_device): Gracefully handle the case
    	that a plugin refuses to load offload images.
---
 libgomp/libgomp.h             | 10 ++++++++++
 libgomp/libgomp.map           | 10 ++++++++++
 libgomp/oacc-init.c           | 18 +++++++++++++++++-
 libgomp/oacc-parallel.c       | 11 +++++++++++
 libgomp/plugin/plugin-nvptx.c | 10 +++++++++-
 libgomp/target.c              |  6 +++++-
 6 files changed, 62 insertions(+), 3 deletions(-)

diff --git libgomp/libgomp.h libgomp/libgomp.h
index 6a05bbc..9fa1cb1 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -1011,6 +1011,8 @@ gomp_work_share_init_done (void)
 /* Now that we're back to default visibility, include the globals.  */
 #include "libgomp_g.h"
 
+#include "openacc.h"
+
 /* Include omp.h by parts.  */
 #include "omp-lock.h"
 #define _LIBGOMP_OMP_LOCK_DEFINED 1
@@ -1047,11 +1049,17 @@ extern void gomp_set_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
 extern void gomp_unset_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
 extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
 
+extern acc_device_t goacc_get_device_type_201 (void) __GOACC_NOTHROW;
+extern acc_device_t goacc_get_device_type_20 (void) __GOACC_NOTHROW;
+
 # define strong_alias(fn, al) \
   extern __typeof (fn) al __attribute__ ((alias (#fn)));
 # define omp_lock_symver(fn) \
   __asm (".symver g" #fn "_30, " #fn "@@OMP_3.0"); \
   __asm (".symver g" #fn "_25, " #fn "@OMP_1.0");
+# define oacc_20_201_symver(fn) \
+  __asm (".symver go" #fn "_201, " #fn "@@OACC_2.0.1"); \
+  __asm (".symver go" #fn "_20, " #fn "@OACC_2.0");
 #else
 # define gomp_init_lock_30 omp_init_lock
 # define gomp_destroy_lock_30 omp_destroy_lock
@@ -1063,6 +1071,8 @@ extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
 # define gomp_set_nest_lock_30 omp_set_nest_lock
 # define gomp_unset_nest_lock_30 omp_unset_nest_lock
 # define gomp_test_nest_lock_30 omp_test_nest_lock
+
+# define goacc_get_device_type_201 acc_get_device_type
 #endif
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
diff --git libgomp/libgomp.map libgomp/libgomp.map
index 4d42c42..4803aab 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -304,7 +304,12 @@ OACC_2.0 {
 	acc_get_num_devices_h_;
 	acc_set_device_type;
 	acc_set_device_type_h_;
+#ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
+	# If the assembler used lacks the .symver directive or the linker
+	# doesn't support GNU symbol versioning, we have the same symbol in
+	# two versions, which Sun ld chokes on.
 	acc_get_device_type;
+#endif
 	acc_get_device_type_h_;
 	acc_set_device_num;
 	acc_set_device_num_h_;
@@ -378,6 +383,11 @@ OACC_2.0 {
 	acc_set_cuda_stream;
 };
 
+OACC_2.0.1 {
+  global:
+	acc_get_device_type;
+} OACC_2.0;
+
 GOACC_2.0 {
   global:
 	GOACC_data_end;
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 42d005d..a7a2243 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -528,7 +528,7 @@ acc_set_device_type (acc_device_t d)
 ialias (acc_set_device_type)
 
 acc_device_t
-acc_get_device_type (void)
+goacc_get_device_type_201 (void)
 {
   acc_device_t res = acc_device_none;
   struct gomp_device_descr *dev;
@@ -552,8 +552,24 @@ acc_get_device_type (void)
   return res;
 }
 
+#ifdef LIBGOMP_GNU_SYMBOL_VERSIONING
+
+/* Legacy entry point (GCC 5).  Only provide host fallback execution.  */
+
+acc_device_t
+goacc_get_device_type_20 (void)
+{
+  return acc_device_host;
+}
+
+oacc_20_201_symver (acc_get_device_type)
+
+#else /* LIBGOMP_GNU_SYMBOL_VERSIONING */
+
 ialias (acc_get_device_type)
 
+#endif /* LIBGOMP_GNU_SYMBOL_VERSIONING */
+
 int
 acc_get_device_num (acc_device_t d)
 {
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 9fe5020..321fd66 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -203,6 +203,17 @@ GOACC_parallel (int device, void (*fn) (void *),
 		int num_gangs, int num_workers, int vector_length,
 		int async, int num_waits, ...)
 {
+#ifdef HAVE_INTTYPES_H
+  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, sizes=%p, kinds=%p, "
+		 "async = %d\n",
+	      __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds, async);
+#else
+  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
+	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds,
+	      async);
+#endif
+  goacc_lazy_initialize ();
+
   goacc_save_and_set_bind (acc_device_host);
   fn (hostaddrs);
   goacc_restore_bind ();
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index fc5f298..56e6fae 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1537,7 +1537,15 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
     GOMP_PLUGIN_fatal ("Offload data incompatible with PTX plugin"
 		       " (expected %u, received %u)",
 		       GOMP_VERSION_NVIDIA_PTX, GOMP_VERSION_DEV (version));
-  
+  if (GOMP_VERSION_DEV (version) == 0)
+    {
+      /* We're no longer support offload data generated by version 0 mkoffload;
+	 it won't be used in the legacy GOMP_parallel entry point.  */
+      GOMP_PLUGIN_debug (0, "Offload data not loaded (version %u)\n",
+			 GOMP_VERSION_DEV (version));
+      return -1;
+    }
+
   GOMP_OFFLOAD_init_device (ord);
 
   dev = ptx_devices[ord];
diff --git libgomp/target.c libgomp/target.c
index dd6f74d..2fbfa6e 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -1008,7 +1008,11 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   num_target_entries
     = devicep->load_image_func (devicep->target_id, version,
 				target_data, &target_table);
-
+  if (num_target_entries < 0)
+    {
+      /* The plugin refused this offload data.  */
+      return;
+    }
   if (num_target_entries != num_funcs + num_vars)
     {
       gomp_mutex_unlock (&devicep->lock);


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: libgomp: Make GCC 5 OpenACC offloading executables work
  2016-04-20 11:35           ` libgomp: Make GCC 5 OpenACC offloading executables work " Thomas Schwinge
@ 2016-05-11 13:46             ` Thomas Schwinge
  2016-05-11 14:22               ` Bernd Schmidt
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Schwinge @ 2016-05-11 13:46 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches; +Cc: Nathan Sidwell

Hi!

Ping.

On Wed, 20 Apr 2016 13:35:28 +0200, I wrote:
> On Mon, 28 Sep 2015 15:38:57 -0400, Nathan Sidwell <nathan@acm.org> wrote:
> > On 09/24/15 04:40, Jakub Jelinek wrote:
> > > Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
> > > anyway because of the incompatible PTX version
> 
> I do agree that it's reasonable to require users to re-compile their code
> when switching between major GCC releases, to retain the offloading
> feature, or otherwise resort to host fallback execution.  I'll propose
> some text along these lines for the GCC 6 release notes.
> 
> > > why don't you just
> > > do
> > >    goacc_save_and_set_bind (acc_device_host);
> > >    fn (hostaddrs);
> > >    goacc_restore_bind ();
> > 
> > Committed the  attached.  Thanks for the review.
> 
> What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
> offloading executables will just run into SIGSEGV.  Here is a patch
> (which depends on
> <http://news.gmane.org/find-root.php?message_id=%3C87a8ko3ea0.fsf%40hertz.schwinge.homeip.net%3E>).
> Unfortunately, we have to jump through some hoops: because GCC 5
> compiler-generated OpenACC reductions code emits calls to
> acc_get_device_type, and because we'll (have to) always resort to host
> fallback execution for GCC 5 executables, we also have to enforce these
> acc_get_device_type calls to return acc_device_host; otherwise reductions
> will give bogus results.  (I hope I'm correctly implementing/using the
> symbol versioning "magic".)  OK for gcc-6-branch and trunk?  Assuming we
> want this fixed on gcc-6-branch, should it be part of 6.1 (to avoid 6.1
> users running into the SIGSEGV), or delay for 6.2?
> 
> We don't have an easy way to add test cases to make sure we don't break
> such legacy interfaces, do we?  (So, I just manually checked a few test
> cases.)
> 
> commit c68c6b8e79176f5dc21684efe2517cbfb83a182e
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Wed Apr 20 13:08:57 2016 +0200
> 
>     libgomp: Make GCC 5 OpenACC offloading executables work
>     
>     	* libgomp.h: Include "openacc.h".
>     	(goacc_get_device_type_201, goacc_get_device_type_20): New
>     	prototypes.
>     	(oacc_20_201_symver, goacc_get_device_type_201): New macros.
>     	* libgomp.map: Add acc_get_device_type with OACC_2.0.1 symbol
>     	version.
>     	* oacc-init.c (acc_get_device_type): Rename to
>     	goacc_get_device_type_201.
>     	(goacc_get_device_type_20): New function.
>     	* oacc-parallel.c (GOACC_parallel): Call goacc_lazy_initialize.
>     	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Refuse version
>     	0 offload images.
>     	* target.c (gomp_load_image_to_device): Gracefully handle the case
>     	that a plugin refuses to load offload images.
> ---
>  libgomp/libgomp.h             | 10 ++++++++++
>  libgomp/libgomp.map           | 10 ++++++++++
>  libgomp/oacc-init.c           | 18 +++++++++++++++++-
>  libgomp/oacc-parallel.c       | 11 +++++++++++
>  libgomp/plugin/plugin-nvptx.c | 10 +++++++++-
>  libgomp/target.c              |  6 +++++-
>  6 files changed, 62 insertions(+), 3 deletions(-)
> 
> diff --git libgomp/libgomp.h libgomp/libgomp.h
> index 6a05bbc..9fa1cb1 100644
> --- libgomp/libgomp.h
> +++ libgomp/libgomp.h
> @@ -1011,6 +1011,8 @@ gomp_work_share_init_done (void)
>  /* Now that we're back to default visibility, include the globals.  */
>  #include "libgomp_g.h"
>  
> +#include "openacc.h"
> +
>  /* Include omp.h by parts.  */
>  #include "omp-lock.h"
>  #define _LIBGOMP_OMP_LOCK_DEFINED 1
> @@ -1047,11 +1049,17 @@ extern void gomp_set_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
>  extern void gomp_unset_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
>  extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
>  
> +extern acc_device_t goacc_get_device_type_201 (void) __GOACC_NOTHROW;
> +extern acc_device_t goacc_get_device_type_20 (void) __GOACC_NOTHROW;
> +
>  # define strong_alias(fn, al) \
>    extern __typeof (fn) al __attribute__ ((alias (#fn)));
>  # define omp_lock_symver(fn) \
>    __asm (".symver g" #fn "_30, " #fn "@@OMP_3.0"); \
>    __asm (".symver g" #fn "_25, " #fn "@OMP_1.0");
> +# define oacc_20_201_symver(fn) \
> +  __asm (".symver go" #fn "_201, " #fn "@@OACC_2.0.1"); \
> +  __asm (".symver go" #fn "_20, " #fn "@OACC_2.0");
>  #else
>  # define gomp_init_lock_30 omp_init_lock
>  # define gomp_destroy_lock_30 omp_destroy_lock
> @@ -1063,6 +1071,8 @@ extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
>  # define gomp_set_nest_lock_30 omp_set_nest_lock
>  # define gomp_unset_nest_lock_30 omp_unset_nest_lock
>  # define gomp_test_nest_lock_30 omp_test_nest_lock
> +
> +# define goacc_get_device_type_201 acc_get_device_type
>  #endif
>  
>  #ifdef HAVE_ATTRIBUTE_VISIBILITY
> diff --git libgomp/libgomp.map libgomp/libgomp.map
> index 4d42c42..4803aab 100644
> --- libgomp/libgomp.map
> +++ libgomp/libgomp.map
> @@ -304,7 +304,12 @@ OACC_2.0 {
>  	acc_get_num_devices_h_;
>  	acc_set_device_type;
>  	acc_set_device_type_h_;
> +#ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
> +	# If the assembler used lacks the .symver directive or the linker
> +	# doesn't support GNU symbol versioning, we have the same symbol in
> +	# two versions, which Sun ld chokes on.
>  	acc_get_device_type;
> +#endif
>  	acc_get_device_type_h_;
>  	acc_set_device_num;
>  	acc_set_device_num_h_;
> @@ -378,6 +383,11 @@ OACC_2.0 {
>  	acc_set_cuda_stream;
>  };
>  
> +OACC_2.0.1 {
> +  global:
> +	acc_get_device_type;
> +} OACC_2.0;
> +
>  GOACC_2.0 {
>    global:
>  	GOACC_data_end;
> diff --git libgomp/oacc-init.c libgomp/oacc-init.c
> index 42d005d..a7a2243 100644
> --- libgomp/oacc-init.c
> +++ libgomp/oacc-init.c
> @@ -528,7 +528,7 @@ acc_set_device_type (acc_device_t d)
>  ialias (acc_set_device_type)
>  
>  acc_device_t
> -acc_get_device_type (void)
> +goacc_get_device_type_201 (void)
>  {
>    acc_device_t res = acc_device_none;
>    struct gomp_device_descr *dev;
> @@ -552,8 +552,24 @@ acc_get_device_type (void)
>    return res;
>  }
>  
> +#ifdef LIBGOMP_GNU_SYMBOL_VERSIONING
> +
> +/* Legacy entry point (GCC 5).  Only provide host fallback execution.  */
> +
> +acc_device_t
> +goacc_get_device_type_20 (void)
> +{
> +  return acc_device_host;
> +}
> +
> +oacc_20_201_symver (acc_get_device_type)
> +
> +#else /* LIBGOMP_GNU_SYMBOL_VERSIONING */
> +
>  ialias (acc_get_device_type)
>  
> +#endif /* LIBGOMP_GNU_SYMBOL_VERSIONING */
> +
>  int
>  acc_get_device_num (acc_device_t d)
>  {
> diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
> index 9fe5020..321fd66 100644
> --- libgomp/oacc-parallel.c
> +++ libgomp/oacc-parallel.c
> @@ -203,6 +203,17 @@ GOACC_parallel (int device, void (*fn) (void *),
>  		int num_gangs, int num_workers, int vector_length,
>  		int async, int num_waits, ...)
>  {
> +#ifdef HAVE_INTTYPES_H
> +  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, sizes=%p, kinds=%p, "
> +		 "async = %d\n",
> +	      __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds, async);
> +#else
> +  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p, async=%d\n",
> +	      __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds,
> +	      async);
> +#endif
> +  goacc_lazy_initialize ();
> +
>    goacc_save_and_set_bind (acc_device_host);
>    fn (hostaddrs);
>    goacc_restore_bind ();
> diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
> index fc5f298..56e6fae 100644
> --- libgomp/plugin/plugin-nvptx.c
> +++ libgomp/plugin/plugin-nvptx.c
> @@ -1537,7 +1537,15 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
>      GOMP_PLUGIN_fatal ("Offload data incompatible with PTX plugin"
>  		       " (expected %u, received %u)",
>  		       GOMP_VERSION_NVIDIA_PTX, GOMP_VERSION_DEV (version));
> -  
> +  if (GOMP_VERSION_DEV (version) == 0)
> +    {
> +      /* We're no longer support offload data generated by version 0 mkoffload;
> +	 it won't be used in the legacy GOMP_parallel entry point.  */
> +      GOMP_PLUGIN_debug (0, "Offload data not loaded (version %u)\n",
> +			 GOMP_VERSION_DEV (version));
> +      return -1;
> +    }
> +
>    GOMP_OFFLOAD_init_device (ord);
>  
>    dev = ptx_devices[ord];
> diff --git libgomp/target.c libgomp/target.c
> index dd6f74d..2fbfa6e 100644
> --- libgomp/target.c
> +++ libgomp/target.c
> @@ -1008,7 +1008,11 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>    num_target_entries
>      = devicep->load_image_func (devicep->target_id, version,
>  				target_data, &target_table);
> -
> +  if (num_target_entries < 0)
> +    {
> +      /* The plugin refused this offload data.  */
> +      return;
> +    }
>    if (num_target_entries != num_funcs + num_vars)
>      {
>        gomp_mutex_unlock (&devicep->lock);


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: libgomp: Make GCC 5 OpenACC offloading executables work
  2016-05-11 13:46             ` libgomp: Make GCC 5 OpenACC offloading executables work Thomas Schwinge
@ 2016-05-11 14:22               ` Bernd Schmidt
  2016-05-11 15:38                 ` Nathan Sidwell
  0 siblings, 1 reply; 34+ messages in thread
From: Bernd Schmidt @ 2016-05-11 14:22 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek, gcc-patches; +Cc: Nathan Sidwell

On 05/11/2016 03:46 PM, Thomas Schwinge wrote:
>> What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
>> offloading executables will just run into SIGSEGV.

I'm tempted to say, let's just wait until someone actually reports that 
in bugzilla. Offloading in gcc-5 was broken enough that I expect no one 
was actually using it. There's really very little point in carrying 
compatibility crud around.


Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: libgomp: Make GCC 5 OpenACC offloading executables work
  2016-05-11 14:22               ` Bernd Schmidt
@ 2016-05-11 15:38                 ` Nathan Sidwell
  2016-05-11 16:02                   ` Thomas Schwinge
  0 siblings, 1 reply; 34+ messages in thread
From: Nathan Sidwell @ 2016-05-11 15:38 UTC (permalink / raw)
  To: Bernd Schmidt, Thomas Schwinge, Jakub Jelinek, gcc-patches

On 05/11/16 10:22, Bernd Schmidt wrote:
> On 05/11/2016 03:46 PM, Thomas Schwinge wrote:
>>> What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
>>> offloading executables will just run into SIGSEGV.
>
> I'm tempted to say, let's just wait until someone actually reports that in
> bugzilla. Offloading in gcc-5 was broken enough that I expect no one was
> actually using it. There's really very little point in carrying compatibility
> crud around.

I agree.  This would simply be enabling a poorly performing binary, rather than 
encouraging a shiny new one.


nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: libgomp: Make GCC 5 OpenACC offloading executables work
  2016-05-11 15:38                 ` Nathan Sidwell
@ 2016-05-11 16:02                   ` Thomas Schwinge
  2016-05-12  9:21                     ` Bernd Schmidt
  2016-05-12 15:35                     ` Nathan Sidwell
  0 siblings, 2 replies; 34+ messages in thread
From: Thomas Schwinge @ 2016-05-11 16:02 UTC (permalink / raw)
  To: Nathan Sidwell, Bernd Schmidt, Jakub Jelinek, gcc-patches

Hi!

On Wed, 11 May 2016 11:38:39 -0400, Nathan Sidwell <nathan@acm.org> wrote:
> On 05/11/16 10:22, Bernd Schmidt wrote:
> > On 05/11/2016 03:46 PM, Thomas Schwinge wrote:
> >>> What we now got, doesn't work, for several reasons.  GCC 5 OpenACC
> >>> offloading executables will just run into SIGSEGV.
> >
> > I'm tempted to say, let's just wait until someone actually reports that in
> > bugzilla. Offloading in gcc-5 was broken enough that I expect no one was
> > actually using it. There's really very little point in carrying compatibility
> > crud around.
> 
> I agree.  This would simply be enabling a poorly performing binary, rather than 
> encouraging a shiny new one.

I conceptually agree to that.  (If we're serious about that, then we can
remove more code, such as the legacy libgomp entry point itself -- a
"missing symbol: [...]" is still vaguely better than a SIGSEGV.)  Yet,
what I fixed here, is just what Jakub and Nathan agreed upon in
<http://news.gmane.org/find-root.php?message_id=%3C20150924084034.GC1847%40tucnak.redhat.com%3E>:
"GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback".
Currently such code will always result in a SIGSEGV, which the patch
fixes.  (And, given that we now have this patch, it seems "unfair" to
"wait until someone actually reports that in bugzilla".)  Another option,
instead of having such legacy entry pointw do host fallback, is to
instead call gomp_fatal with a message like "re-compile your code with a
newer GCC version".


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: libgomp: Make GCC 5 OpenACC offloading executables work
  2016-05-11 16:02                   ` Thomas Schwinge
@ 2016-05-12  9:21                     ` Bernd Schmidt
  2016-05-12 15:35                     ` Nathan Sidwell
  1 sibling, 0 replies; 34+ messages in thread
From: Bernd Schmidt @ 2016-05-12  9:21 UTC (permalink / raw)
  To: Thomas Schwinge, Nathan Sidwell, Jakub Jelinek, gcc-patches

On 05/11/2016 06:02 PM, Thomas Schwinge wrote:
> I conceptually agree to that.  (If we're serious about that, then we can
> remove more code, such as the legacy libgomp entry point itself -- a
> "missing symbol: [...]" is still vaguely better than a SIGSEGV.)  Yet,
> what I fixed here, is just what Jakub and Nathan agreed upon in
> <http://news.gmane.org/find-root.php?message_id=%3C20150924084034.GC1847%40tucnak.redhat.com%3E>:
> "GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback".
> Currently such code will always result in a SIGSEGV, which the patch
> fixes.  (And, given that we now have this patch, it seems "unfair" to
> "wait until someone actually reports that in bugzilla".)

I'll defer to Jakub. Don't want to block an existing patch if you really 
want to apply it, I just think we should go in the other direction of 
removing this fallback support eventually.


Bernd

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: libgomp: Make GCC 5 OpenACC offloading executables work
  2016-05-11 16:02                   ` Thomas Schwinge
  2016-05-12  9:21                     ` Bernd Schmidt
@ 2016-05-12 15:35                     ` Nathan Sidwell
  1 sibling, 0 replies; 34+ messages in thread
From: Nathan Sidwell @ 2016-05-12 15:35 UTC (permalink / raw)
  To: Thomas Schwinge, Bernd Schmidt, Jakub Jelinek, gcc-patches

On 05/11/16 12:02, Thomas Schwinge wrote:

> I conceptually agree to that.  (If we're serious about that, then we can
> remove more code, such as the legacy libgomp entry point itself -- a
> "missing symbol: [...]" is still vaguely better than a SIGSEGV.)  Yet,
> what I fixed here, is just what Jakub and Nathan agreed upon in
> <http://news.gmane.org/find-root.php?message_id=%3C20150924084034.GC1847%40tucnak.redhat.com%3E>:

Well, that email appears to be from september, and this patch is more complex 
than the linker versioning script I had anticipated.  If it's taken this long to 
create a patch, either it's very hard, or it's not a priority.  Given 6.1 is 
released, it also seems to have failed to catch the train.

nathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-05-12 15:35 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-25 13:40 Openacc launch API Nathan Sidwell
2015-08-28 17:30 ` Nathan Sidwell
2015-08-28 18:07   ` Jakub Jelinek
2015-08-28 19:50     ` Nathan Sidwell
2015-09-07 13:09 ` Nathan Sidwell
2015-09-11 15:59   ` Nathan Sidwell
2015-09-16 20:59     ` Nathan Sidwell
2015-09-17  9:46 ` Bernd Schmidt
2015-09-17 13:24   ` Nathan Sidwell
2015-09-17 14:43   ` Nathan Sidwell
2015-09-18  9:13     ` Bernd Schmidt
2015-09-18 18:56       ` Nathan Sidwell
2015-09-24  9:28       ` Jakub Jelinek
2015-09-24  9:58         ` Bernd Schmidt
2015-09-24 10:28           ` Jakub Jelinek
2015-09-24 10:41             ` Bernd Schmidt
2015-09-28 21:20         ` Nathan Sidwell
2016-04-12 13:17           ` libgomp external ABI prototypes maintenance (was: Openacc launch API) Thomas Schwinge
2016-04-20 11:35           ` libgomp: Make GCC 5 OpenACC offloading executables work " Thomas Schwinge
2016-05-11 13:46             ` libgomp: Make GCC 5 OpenACC offloading executables work Thomas Schwinge
2016-05-11 14:22               ` Bernd Schmidt
2016-05-11 15:38                 ` Nathan Sidwell
2016-05-11 16:02                   ` Thomas Schwinge
2016-05-12  9:21                     ` Bernd Schmidt
2016-05-12 15:35                     ` Nathan Sidwell
2015-09-21 16:30     ` Openacc launch API Nathan Sidwell
2015-09-30 12:42 ` Matthias Klose
2015-09-30 12:45   ` Bernd Schmidt
2015-09-30 12:48     ` Matthias Klose
2015-09-30 13:05   ` Nathan Sidwell
2015-09-30 13:27 JIT breakage after last builtin-types change Ulrich Drepper
2015-09-30 13:27 ` Jakub Jelinek
2015-09-30 14:48   ` Ulrich Drepper
2015-09-30 15:50     ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).